scsi: core: avoid host-wide host_busy counter for scsi_mq
It isn't necessary to check the host depth in scsi_queue_rq() any more
since it has been respected by blk-mq before calling scsi_queue_rq() via
getting driver tag.
Lots of LUNs may attach to same host and per-host IOPS may reach millions,
so we should avoid expensive atomic operations on the host-wide counter in
the IO path.
This patch implements scsi_host_busy() via blk_mq_tagset_busy_iter() with
one scsi command state for reading the count of busy IOs for scsi_mq.
It is observed that IOPS is increased by 15% in IO test on scsi_debug (32
LUNs, 32 submit queues, 1024 can_queue, libaio/dio) in a dual-socket
system.
Cc: Jens Axboe <[email protected]>
Cc: Ewan D. Milne <[email protected]>
Cc: Omar Sandoval <[email protected]>,
Cc: "Martin K. Petersen" <[email protected]>,
Cc: James Bottomley <[email protected]>,
Cc: Christoph Hellwig <[email protected]>,
Cc: Kashyap Desai <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Laurence Oberman <[email protected]>
Cc: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>