vdpa/mlx5: Parallelize device resume
Currently device resume works on vqs serially. Building up on previous
changes that converted vq operations to the async api, this patch
parallelizes the device resume.
For 1 vDPA device x 32 VQs (16 VQPs) attached to a large VM (256 GB RAM,
32 CPUs x 2 threads per core), the device resume time is reduced from
~16 ms to ~4.5 ms.
Signed-off-by: Dragos Tatulea <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Acked-by: Eugenio PĂ©rez <[email protected]>
Message-Id: <
20240816090159.
1967650[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Tested-by: Lei Yang <[email protected]>