]> Git Repo - qemu.git/blame - docs/specs/ivshmem-spec.txt
specs/vhost-user: improve VHOST_SET_VRING_NUM documentation
[qemu.git] / docs / specs / ivshmem-spec.txt
CommitLineData
fdee2025
MA
1= Device Specification for Inter-VM shared memory device =
2
3The Inter-VM shared memory device (ivshmem) is designed to share a
4memory region between multiple QEMU processes running different guests
5and the host. In order for all guests to be able to pick up the
6shared memory area, it is modeled by QEMU as a PCI device exposing
7said memory to the guest as a PCI BAR.
8
9The device can use a shared memory object on the host directly, or it
10can obtain one from an ivshmem server.
11
12In the latter case, the device can additionally interrupt its peers, and
13get interrupted by its peers.
14
15
16== Configuring the ivshmem PCI device ==
17
18There are two basic configurations:
19
5400c02b 20- Just shared memory: -device ivshmem-plain,memdev=HMB,...
fdee2025 21
5400c02b
MA
22 This uses host memory backend HMB. It should have option "share"
23 set.
fdee2025
MA
24
25- Shared memory plus interrupts: -device ivshmem,chardev=CHR,vectors=N,...
26
27 An ivshmem server must already be running on the host. The device
28 connects to the server's UNIX domain socket via character device
29 CHR.
30
31 Each peer gets assigned a unique ID by the server. IDs must be
32 between 0 and 65535.
33
5400c02b
MA
34 Interrupts are message-signaled (MSI-X). vectors=N configures the
35 number of vectors to use.
fdee2025
MA
36
37For more details on ivshmem device properties, see The QEMU Emulator
38User Documentation (qemu-doc.*).
39
40
41== The ivshmem PCI device's guest interface ==
42
5400c02b
MA
43The device has vendor ID 1af4, device ID 1110, revision 1. Before
44QEMU 2.6.0, it had revision 0.
fdee2025
MA
45
46=== PCI BARs ===
47
48The ivshmem PCI device has two or three BARs:
49
50- BAR0 holds device registers (256 Byte MMIO)
5400c02b 51- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell)
fdee2025
MA
52- BAR2 maps the shared memory object
53
54There are two ways to use this device:
55
56- If you only need the shared memory part, BAR2 suffices. This way,
57 you have access to the shared memory in the guest and can use it as
58 you see fit. Memnic, for example, uses ivshmem this way from guest
59 user space (see http://dpdk.org/browse/memnic).
60
61- If you additionally need the capability for peers to interrupt each
5400c02b
MA
62 other, you need BAR0 and BAR1. You will most likely want to write a
63 kernel driver to handle interrupts. Requires the device to be
64 configured for interrupts, obviously.
fdee2025 65
1309cf44
MA
66Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
67configured for interrupts. It becomes safely accessible only after
5400c02b
MA
68the ivshmem server provided the shared memory. These devices have PCI
69revision 0 rather than 1. Guest software should wait for the
70IVPosition register (described below) to become non-negative before
71accessing BAR2.
fdee2025 72
5400c02b
MA
73Revision 0 of the device is not capable to tell guest software whether
74it is configured for interrupts.
fdee2025
MA
75
76=== PCI device registers ===
77
78BAR 0 contains the following registers:
79
80 Offset Size Access On reset Function
81 0 4 read/write 0 Interrupt Mask
5400c02b
MA
82 bit 0: peer interrupt (rev 0)
83 reserved (rev 1)
fdee2025
MA
84 bit 1..31: reserved
85 4 4 read/write 0 Interrupt Status
5400c02b
MA
86 bit 0: peer interrupt (rev 0)
87 reserved (rev 1)
fdee2025 88 bit 1..31: reserved
1309cf44 89 8 4 read-only 0 or ID IVPosition
fdee2025
MA
90 12 4 write-only N/A Doorbell
91 bit 0..15: vector
92 bit 16..31: peer ID
93 16 240 none N/A reserved
94
95Software should only access the registers as specified in column
96"Access". Reserved bits should be ignored on read, and preserved on
97write.
98
5400c02b
MA
99In revision 0 of the device, Interrupt Status and Mask Register
100together control the legacy INTx interrupt when the device has no
101MSI-X capability: INTx is asserted when the bit-wise AND of Status and
102Mask is non-zero and the device has no MSI-X capability. Interrupt
103Status Register bit 0 becomes 1 when an interrupt request from a peer
104is received. Reading the register clears it.
fdee2025
MA
105
106IVPosition Register: if the device is not configured for interrupts,
1309cf44
MA
107this is zero. Else, it is the device's ID (between 0 and 65535).
108
109Before QEMU 2.6.0, the register may read -1 for a short while after
5400c02b 110reset. These devices have PCI revision 0 rather than 1.
fdee2025
MA
111
112There is no good way for software to find out whether the device is
113configured for interrupts. A positive IVPosition means interrupts,
1309cf44 114but zero could be either.
fdee2025
MA
115
116Doorbell Register: writing this register requests to interrupt a peer.
117The written value's high 16 bits are the ID of the peer to interrupt,
118and its low 16 bits select an interrupt vector.
119
120If the device is not configured for interrupts, the write is ignored.
121
122If the interrupt hasn't completed setup, the write is ignored. The
123device is not capable to tell guest software whether setup is
124complete. Interrupts can regress to this state on migration.
125
126If the peer with the requested ID isn't connected, or it has fewer
127interrupt vectors connected, the write is ignored. The device is not
128capable to tell guest software what peers are connected, or how many
129interrupt vectors are connected.
130
5400c02b
MA
131The peer's interrupt for this vector then becomes pending. There is
132no way for software to clear the pending bit, and a polling mode of
133operation is therefore impossible.
fdee2025 134
5400c02b
MA
135If the peer is a revision 0 device without MSI-X capability, its
136Interrupt Status register is set to 1. This asserts INTx unless
137masked by the Interrupt Mask register. The device is not capable to
138communicate the interrupt vector to guest software then.
fdee2025
MA
139
140With multiple MSI-X vectors, different vectors can be used to indicate
141different events have occurred. The semantics of interrupt vectors
142are left to the application.
143
144
145== Interrupt infrastructure ==
146
147When configured for interrupts, the peers share eventfd objects in
148addition to shared memory. The shared resources are managed by an
149ivshmem server.
150
151=== The ivshmem server ===
152
153The server listens on a UNIX domain socket.
154
155For each new client that connects to the server, the server
156- picks an ID,
157- creates eventfd file descriptors for the interrupt vectors,
158- sends the ID and the file descriptor for the shared memory to the
159 new client,
160- sends connect notifications for the new client to the other clients
161 (these contain file descriptors for sending interrupts),
162- sends connect notifications for the other clients to the new client,
163 and
164- sends interrupt setup messages to the new client (these contain file
165 descriptors for receiving interrupts).
166
62a830b6
MA
167The first client to connect to the server receives ID zero.
168
fdee2025
MA
169When a client disconnects from the server, the server sends disconnect
170notifications to the other clients.
171
172The next section describes the protocol in detail.
173
174If the server terminates without sending disconnect notifications for
175its connected clients, the clients can elect to continue. They can
176communicate with each other normally, but won't receive disconnect
177notification on disconnect, and no new clients can connect. There is
178no way for the clients to connect to a restarted server. The device
179is not capable to tell guest software whether the server is still up.
180
181Example server code is in contrib/ivshmem-server/. Not to be used in
182production. It assumes all clients use the same number of interrupt
183vectors.
184
185A standalone client is in contrib/ivshmem-client/. It can be useful
186for debugging.
187
188=== The ivshmem Client-Server Protocol ===
189
190An ivshmem device configured for interrupts connects to an ivshmem
191server. This section details the protocol between the two.
192
193The connection is one-way: the server sends messages to the client.
194Each message consists of a single 8 byte little-endian signed number,
195and may be accompanied by a file descriptor via SCM_RIGHTS. Both
196client and server close the connection on error.
197
71c26581
MA
198Note: QEMU currently doesn't close the connection right on error, but
199only when the character device is destroyed.
200
fdee2025
MA
201On connect, the server sends the following messages in order:
202
2031. The protocol version number, currently zero. The client should
204 close the connection on receipt of versions it can't handle.
205
2062. The client's ID. This is unique among all clients of this server.
207 IDs must be between 0 and 65535, because the Doorbell register
208 provides only 16 bits for them.
209
2103. The number -1, accompanied by the file descriptor for the shared
211 memory.
212
2134. Connect notifications for existing other clients, if any. This is
214 a peer ID (number between 0 and 65535 other than the client's ID),
215 repeated N times. Each repetition is accompanied by one file
216 descriptor. These are for interrupting the peer with that ID using
217 vector 0,..,N-1, in order. If the client is configured for fewer
218 vectors, it closes the extra file descriptors. If it is configured
219 for more, the extra vectors remain unconnected.
220
2215. Interrupt setup. This is the client's own ID, repeated N times.
222 Each repetition is accompanied by one file descriptor. These are
223 for receiving interrupts from peers using vector 0,..,N-1, in
224 order. If the client is configured for fewer vectors, it closes
225 the extra file descriptors. If it is configured for more, the
226 extra vectors remain unconnected.
227
228From then on, the server sends these kinds of messages:
229
2306. Connection / disconnection notification. This is a peer ID.
231
232 - If the number comes with a file descriptor, it's a connection
233 notification, exactly like in step 4.
234
235 - Else, it's a disconnection notification for the peer with that ID.
236
237Known bugs:
238
239* The protocol changed incompatibly in QEMU 2.5. Before, messages
240 were native endian long, and there was no version number.
241
242* The protocol is poorly designed.
243
244=== The ivshmem Client-Client Protocol ===
245
246An ivshmem device configured for interrupts receives eventfd file
247descriptors for interrupting peers and getting interrupted by peers
248from the server, as explained in the previous section.
249
250To interrupt a peer, the device writes the 8-byte integer 1 in native
251byte order to the respective file descriptor.
252
253To receive an interrupt, the device reads and discards as many 8-byte
254integers as it can.
This page took 0.05432 seconds and 4 git commands to generate.