John Garry [Thu, 28 Feb 2019 14:51:00 +0000 (22:51 +0800)]
scsi: hisi_sas: Set PHY linkrate when disconnected
When the PHY comes down, we currently do not set the negotiated linkrate:
root@(none)$ pwd
/sys/class/sas_phy/phy-0:0
root@(none)$ more enable
1
root@(none)$ more negotiated_linkrate
12.0 Gbit
root@(none)$ echo 0 > enable
root@(none)$ more negotiated_linkrate
12.0 Gbit
root@(none)$
This patch fixes the driver code to set it properly when the PHY comes
down.
If the PHY had been enabled, then set unknown; otherwise, flag as disabled.
The logical place to set the negotiated linkrate for this scenario is PHY
down routine, which is called from the PHY down ISR.
However, it is not possible to know if the PHY comes down due to PHY
disable or loss of link, as sas_phy.enabled member is not set until after
the transport disable routine is complete, which races with the PHY down
ISR.
As an imperfect solution, use sas_phy_data.enable as the flag to know if
the PHY is down due to disable. It's imperfect, as sas_phy_data is internal
to libsas.
I can't see another way without adding a new field to hisi_sas_phy and
managing it, or changing SCSI SAS transport.
Xiaofei Tan [Thu, 28 Feb 2019 14:50:59 +0000 (22:50 +0800)]
scsi: hisi_sas: print PHY RX errors count for later revision of v3 hw
The later revision of v3 hw has added an function of interrupt coalesce
according to time for PHY RX errors. We set the coalesce time to 1s. Then
we print PHY RX errors count when PHY RX errors happen, and don't need to
worry that there may be too much log prints.
Besides, we use hisi_sas_phy.lock to protect error count value. Because we
update them by calling phy_get_events_v3_hw(), which is also used by core
driver (for get PHY events function).
We relocate phy_get_events_v3_hw() to avoid a further declaration.
Xiang Chen [Thu, 28 Feb 2019 14:50:58 +0000 (22:50 +0800)]
scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO
For internal IO and SMP IO, there is a time-out timer for them. In the
timer handler, it checks whether IO is done according to the flag
task->task_state_lock.
There is an issue which may cause system suspended: internal IO or SMP IO
is sent, but at that time because of hardware exception (such as inject
2Bit ECC error), so IO is not completed and also not timeout. But, at that
time, the SAS controller reset occurs to recover system. It will release
the resource and set the status of IO to be SAS_TASK_STATE_DONE, so when IO
timeout, it will never complete the completion of IO and wait for ever.
Dan Carpenter [Wed, 20 Feb 2019 05:39:13 +0000 (08:39 +0300)]
scsi: qla2xxx: check for kstrtol() failure
The error handling was unintentionally left out so it introduces a Smatch
static checker warning:
drivers/scsi/qla2xxx/qla_attr.c:1655 qla2x00_port_speed_store()
error: uninitialized symbol 'type'.
Fixes: a7b9ca7fc87a ("scsi: qla2xxx: Add support for setting port speed") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
Arnd Bergmann [Mon, 4 Mar 2019 19:39:09 +0000 (20:39 +0100)]
scsi: lpfc: fix 32-bit format string warning
On 32-bit architectures, we see a warning when %ld is used to print a
size_t:
In file included from drivers/scsi/lpfc/lpfc_init.c:62:
drivers/scsi/lpfc/lpfc_init.c: In function 'lpfc_new_io_buf':
drivers/scsi/lpfc/lpfc_logmsg.h:62:45: error: format '%ld' expects argument of type 'long int', but argument 5 has type 'unsigned int' [-Werror=format=]
This is harmless, but portable code should just use %zd to avoid the
warning.
Fixes: 0794d601d174 ("scsi: lpfc: Implement common IO buffers between NVME and SCSI") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
Vasily Averin [Thu, 21 Feb 2019 15:23:17 +0000 (18:23 +0300)]
scsi: libiscsi: fall back to sendmsg for slab pages
In "XFS over network block device" scenario XFS can create IO requests with
slab-based XFS metadata. During processing such requests tcp_sendpage() can
merge skb fragments with neighbour slab objects.
If receiving side is located on the same host tcp_recvmsg() can trigger
BUG_ON in hardening check and crash the host with following message:
This patch redirect such requests from sednpage to sendmsg path. The
problem is similar to one described in recent commit 7e241f647dc7
("libceph: fall back to sendmsg for slab pages")
Arnd Bergmann [Mon, 4 Mar 2019 19:39:10 +0000 (20:39 +0100)]
scsi: qla2xxx: avoid printf format warning
Depending on the target architecture and configuration, both phys_addr_t
and dma_addr_t may be smaller than 'long long', so we get a warning when
printing either of them using the %llx format string:
drivers/scsi/qla2xxx/qla_iocb.c: In function 'qla24xx_walk_and_build_prot_sglist':
drivers/scsi/qla2xxx/qla_iocb.c:1140:46: error: format '%llx' expects argument of type 'long long unsigned int', but argument 6 has type 'dma_addr_t' {aka 'unsigned int'} [-Werror=format=]
"%s: page boundary crossing (phys=%llx len=%x)\n",
~~~^
%x
__func__, sle_phys, sg->length);
~~~~~~~~
drivers/scsi/qla2xxx/qla_iocb.c:1180:29: error: format '%llx' expects argument of type 'long long unsigned int', but argument 7 has type 'dma_addr_t' {aka 'unsigned int'} [-Werror=format=]
"%s: sg[%x] (phys=%llx sglen=%x) ldma_sg_len: %x dif_bundl_len: %x ldma_needed: %x\n",
~~~^
There are special %pad and %pap format strings in Linux that we could use
here, but since the driver already does 64-bit arithmetic on the values,
using a plain 'u64' seems more consistent here.
Note: A possible related issue may be that the driver possibly checks the
wrong kind of overflow: when an IOMMU is in use, buffers that cross a
32-bit boundary in physical addresses would still be mapped into dma
addresses within the low 4GB space, so I suspect that we actually want to
check sg_dma_address() instead of sg_phys() here.
Fixes: 50b812755e97 ("scsi: qla2xxx: Fix DMA error when the DIF sg buffer crosses 4GB boundary") Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
James Smart [Mon, 4 Mar 2019 23:27:51 +0000 (15:27 -0800)]
scsi: lpfc: resolve static checker warning in lpfc_sli4_hba_unset
The patch that replaced io channels for hdw_queues now reports the
following static checker warning:
drivers/scsi/lpfc/lpfc_init.c:11136 lpfc_sli4_hba_unset()
error: we previously assumed 'phba->pport' could be null (see line 11074)
Resolve by adding a pport NULL check.
[mkp: tag tweak]
Fixes: cdb42becdd40 ("scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu"_ Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Dick Kennedy <[email protected]> Signed-off-by: James Smart <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
The outer routine lpfc_sli_issue_iocb(), which decomposes into the
SLI3 (s3) or SLI4 (s4) subroutines takes out the locks. For s3, it takes
out the hbalock. For s4, it takes out the ring_lock. The lockdep check in
the s3 and s4 subroutines both check hbalock, which is incorrect for s4.
Revise the s4 subroutine to lockdep check the ring_lock.
Bill Kuzeja [Mon, 4 Mar 2019 13:25:46 +0000 (08:25 -0500)]
scsi: qla2xxx: Fix panic in qla_dfs_tgt_counters_show
When trying to display tgt_counters in the debugfs, a panic can result.
There is no null check for qpair after it is assigned in the for-loop.
Unless vha->hw->queue_pair_map array is completely filled with entries, the
system will panic dereferencing a null pointer.
Steve Sistare [Fri, 1 Mar 2019 14:46:28 +0000 (06:46 -0800)]
scsi: megaraid_sas: reduce module load time
megaraid_sas takes 1+ seconds to load while waiting for firmware:
[2.822603] megaraid_sas 0000:03:00.0: Waiting for FW to come to ready state
[3.871003] megaraid_sas 0000:03:00.0: FW now in Ready state
This is due to the following loop in megasas_transition_to_ready(), which
waits a minimum of 1 second, even though the FW becomes ready in tens of
millisecs:
/*
* The cur_state should not last for more than max_wait secs
*/
for (i = 0; i < max_wait; i++) {
...
msleep(1000);
...
dev_info(&instance->pdev->dev, "FW now in Ready state\n");
This is a regression, caused by a change of the msleep granularity from 1
to 1000 due to concern about waiting too long on systems with coarse
jiffies.
To fix, increase iterations and use msleep(20), which results in:
[2.670627] megaraid_sas 0000:03:00.0: Waiting for FW to come to ready state
[2.739386] megaraid_sas 0000:03:00.0: FW now in Ready state
Fixes: fb2f3e96d80f ("scsi: megaraid_sas: Fix msleep granularity") Signed-off-by: Steve Sistare <[email protected]> Acked-by: Sumit Saxena <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
Cathy Avery [Thu, 28 Feb 2019 19:28:24 +0000 (14:28 -0500)]
scsi: target: tcmu: wait for nl reply only if there are listeners or during an add
genlmsg_multicast_allns now returns the correct statuses when a message is
sent to a listener. However in the case of adding a device we want to wait
for the listener otherwise we may miss the the device during startup.
Felipe Franciosi [Wed, 27 Feb 2019 16:10:34 +0000 (16:10 +0000)]
scsi: virtio_scsi: don't send sc payload with tmfs
The virtio scsi spec defines struct virtio_scsi_ctrl_tmf as a set of
device-readable records and a single device-writable response entry:
struct virtio_scsi_ctrl_tmf
{
// Device-readable part
le32 type;
le32 subtype;
u8 lun[8];
le64 id;
// Device-writable part
u8 response;
}
The above should be organised as two descriptor entries (or potentially
more if using VIRTIO_F_ANY_LAYOUT), but without any extra data after "le64
id" or after "u8 response".
The Linux driver doesn't respect that, with virtscsi_abort() and
virtscsi_device_reset() setting cmd->sc before calling virtscsi_tmf(). It
results in the original scsi command payload (or writable buffers) added to
the tmf.
This fixes the problem by leaving cmd->sc zeroed out, which makes
virtscsi_kick_cmd() add the tmf to the control vq without any payload.
Erwan Velu [Fri, 1 Mar 2019 16:08:06 +0000 (17:08 +0100)]
scsi: smartpqi: Reporting 'logical unit failure'
When the HARDWARE_ERROR/0x3e/0x1 case is triggered, the logical volume
is offlined. When reading the kernel log, the reason why the device
got offlined isn't reported to the user. This situation makes it
difficult for admins to root cause.
Varun Prakash [Thu, 21 Feb 2019 14:42:16 +0000 (20:12 +0530)]
scsi: cxgb4i: validate tcp sequence number only if chip version <= T5
T6 adapters generates DDP completion message on receiving all iSCSI pdus in
a sequence. Because of this, driver can not keep track of tcp sequence
number for T6 adapters.
Benjamin Block [Thu, 21 Feb 2019 09:18:00 +0000 (10:18 +0100)]
scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
We had a test-report where, under memory pressure, adding LUNs to the
systems would fail (the tests add LUNs strictly in sequence):
[ 5525.853432] scsi 0:0:1:1088045124: Direct-Access IBM 2107900 .148 PQ: 0 ANSI: 5
[ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
[ 5525.853830] scsi 0:0:1:1088045124: alua: device naa.6005076303ffd32700000000000044da port group 0 rel port 43
[ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
[ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
[ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
[ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
[ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 5525.857838] sdk: sdk1
[ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
[ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
[ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
there seems to be no reason to use GFP_ATMOIC here. All the different
call-contexts use a mutex at some point, and nothing in between that
requires no sleeping, as far as I could see. Additionally, the code that
later allocates the block queue for the device (scsi_mq_alloc_queue())
already uses GFP_KERNEL.
There are similar allocations in two other functions:
scsi_probe_and_add_lun(), and scsi_add_lun(),; that can also be done with
GFP_KERNEL.
Here is the contexts for the three functions so far:
scsi: mpt3sas: Add missing breaks in switch statements
Fix the following warnings by adding the proper missing breaks:
drivers/scsi/mpt3sas/mpt3sas_base.c: In function _base_display_OEMs_branding :
drivers/scsi/mpt3sas/mpt3sas_base.c:3548:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
switch (ioc->pdev->subsystem_device) {
^~~~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3566:3: note: here
case MPI2_MFGPAGE_DEVID_SAS2308_2:
^~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3567:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
switch (ioc->pdev->subsystem_device) {
^~~~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3601:3: note: here
case MPI25_MFGPAGE_DEVID_SAS3008:
^~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3735:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
switch (ioc->pdev->subsystem_device) {
^~~~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3745:3: note: here
case MPI2_MFGPAGE_DEVID_SAS2308_2:
^~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3746:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
switch (ioc->pdev->subsystem_device) {
^~~~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3768:3: note: here
default:
^~~~~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Negative constant left-shift is undefined behaviour in the C standard, and
as such newer versions of clang (at least) warn against it. GCC supports it
for a long time, but it would be better to remove it and rely on defined
behaviour.
My understanding is "~(-1 << N)" in 2's complement is intended to generate
a bit pattern of zeroes ending with N '1' bits. The same can be achieved by
"(1 << N) - 1" in a well-defined way, so switch to it to remove the
warning.
Tested: building a kernel with generic SCSI tape, and checking basic
operations (mt status, mt eject) on a real LTO unit. Cannot test the osst
driver.
Avri Altman [Wed, 20 Feb 2019 07:11:14 +0000 (09:11 +0200)]
scsi: ufs-bsg: Allow reading descriptors
Add this functionality, placing the descriptor being read in the actual
data buffer in the bio.
That is, for both read and write descriptors query upiu, we are using the
job's request_payload. This in turn, is mapped back in user land to the
applicable sg_io_v4 xferp: dout_xferp for write descriptor, and din_xferp
for read descriptor.
Avri Altman [Wed, 20 Feb 2019 07:11:13 +0000 (09:11 +0200)]
scsi: ufs: Allow reading descriptor via raw upiu
Allow to read descriptors via raw upiu. This in fact was forbidden just as
a precaution, as ufs-bsg actually enforces which functionality is
supported.
Avri Altman [Wed, 20 Feb 2019 07:11:12 +0000 (09:11 +0200)]
scsi: ufs-bsg: Change the calling convention for write descriptor
When we had a write descriptor query upiu, we appended the descriptor right
after the bsg request. This was fine as the bsg driver allows to allocate
whatever buffer we needed in its job request.
Still, the proper way to deliver payload, however small (we only write
config descriptors of 144 bytes), is by using the job request payload data
buffer.
So change this ABI now, while ufs-bsg is still new, and nobody is actually
using it.
ret = ufshcd_set_vccq_rail_unused(hba,
(hba->dev_quirks & UFS_DEVICE_NO_VCCQ) ? true : false);
if (ret)
goto out;
=======
ufs_advertise_fixup_device(hba);
>>>>>>> parent of 60f0187031c0... scsi: ufs: disable vccq if it's not needed by UFS device
Resolution: keep HEAD, and delete the ufshcd_set_vccq_rail_unused() call
and corresponding error-handling code.
Clean up loose ends in a follow-up patch.
60f0187031c0 introduced a small power optimization: ignore the vccq load
specified in the UFSHC DT node when said host controller is connected to
specific Flash chips (currently, Samsung and Hynix).
Unfortunately, this optimization breaks UFS on systems where vccq powers
not only the Flash chip, but the host controller as well, such as APQ8098
MEDIABOX or MTP8998:
[ 3.929877] ufshcd-qcom 1da4000.ufshc: ufshcd_query_attr: opcode 0x04 for idn 13 failed, index 0, err = -11
[ 5.433815] ufshcd-qcom 1da4000.ufshc: ufshcd_query_attr: opcode 0x04 for idn 13 failed, index 0, err = -11
[ 6.937771] ufshcd-qcom 1da4000.ufshc: ufshcd_query_attr: opcode 0x04 for idn 13 failed, index 0, err = -11
[ 6.937866] ufshcd-qcom 1da4000.ufshc: ufshcd_query_attr_retry: query attribute, idn 13, failed with error -11 after 3 retires
[ 6.946412] ufshcd-qcom 1da4000.ufshc: ufshcd_disable_auto_bkops: failed to enable exception event -11
[ 6.957972] ufshcd-qcom 1da4000.ufshc: dme-peer-get: attr-id 0x1587 failed 3 retries
[ 6.967181] ufshcd-qcom 1da4000.ufshc: dme-peer-get: attr-id 0x1586 failed 3 retries
[ 6.975025] ufshcd-qcom 1da4000.ufshc: ufshcd_get_max_pwr_mode: invalid max pwm tx gear read = 0
[ 6.982755] ufshcd-qcom 1da4000.ufshc: ufshcd_probe_hba: Failed getting max supported power mode
[ 8.505770] ufshcd-qcom 1da4000.ufshc: ufshcd_query_flag: Sending flag query for idn 3 failed, err = -11
[ 10.009807] ufshcd-qcom 1da4000.ufshc: ufshcd_query_flag: Sending flag query for idn 3 failed, err = -11
[ 11.513766] ufshcd-qcom 1da4000.ufshc: ufshcd_query_flag: Sending flag query for idn 3 failed, err = -11
[ 11.513861] ufshcd-qcom 1da4000.ufshc: ufshcd_query_flag_retry: query attribute, opcode 5, idn 3, failed with error -11 after 3 retires
[ 13.049807] ufshcd-qcom 1da4000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 8 failed, index 0, err = -11
[ 14.553768] ufshcd-qcom 1da4000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 8 failed, index 0, err = -11
[ 16.057767] ufshcd-qcom 1da4000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 8 failed, index 0, err = -11
[ 16.057872] ufshcd-qcom 1da4000.ufshc: ufshcd_read_desc_param: Failed reading descriptor. desc_id 8, desc_index 0, param_offset 0, ret -11
[ 16.067109] ufshcd-qcom 1da4000.ufshc: ufshcd_init_icc_levels: Failed reading power descriptor.len = 98 ret = -11
[ 37.073787] ufshcd-qcom 1da4000.ufshc: link startup failed 1
In my opinion, the rationale for the original patch is questionable. If
neither the UFSHC, nor the Flash chip, require any load from vccq, then
that power rail should simply not be specified at all in the DT.
Working around that fact in the driver is detrimental, as evidenced by the
failure to initialize the host controller on MSM8998.
YueHaibing [Fri, 22 Feb 2019 01:58:58 +0000 (09:58 +0800)]
scsi: megaraid_sas: Remove a bunch of set but not used variables
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'wait_and_poll':
drivers/scsi/megaraid/megaraid_sas_fusion.c:936:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_sync_map_info':
drivers/scsi/megaraid/megaraid_sas_fusion.c:1329:6: warning:
variable 'size_sync_info' set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_init_adapter_fusion':
drivers/scsi/megaraid/megaraid_sas_fusion.c:1639:39: warning:
variable 'reg_set' set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_is_prp_possible':
drivers/scsi/megaraid/megaraid_sas_fusion.c:1925:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_make_prp_nvme':
drivers/scsi/megaraid/megaraid_sas_fusion.c:2047:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_build_ldio_fusion':
drivers/scsi/megaraid/megaraid_sas_fusion.c:2620:42: warning:
variable 'req_desc' set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_build_and_issue_cmd_fusion':
drivers/scsi/megaraid/megaraid_sas_fusion.c:3245:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_task_abort_fusion':
drivers/scsi/megaraid/megaraid_sas_fusion.c:4398:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_reset_target_fusion':
drivers/scsi/megaraid/megaraid_sas_fusion.c:4484:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]
scsi: sd: Optimal I/O size should be a multiple of physical block size
It was reported that some devices report an OPTIMAL TRANSFER LENGTH of
0xFFFF blocks. That looks bogus, especially for a device with a
4096-byte physical block size.
Ignore OPTIMAL TRANSFER LENGTH if it is not a multiple of the device's
reported physical block size.
To make the sanity checking conditionals more readable--and to
facilitate printing warnings--relocate the checking to a helper
function. No functional change aside from the printks.
scsi: MAINTAINERS: SCSI initiator and target tweaks
Nic has been absent for a while and target changes now go through the SCSI
tree. To avoid confusion wrt. the NVMe target, clarify that this entry
refers to the SCSI target subsystem.
Also add patchwork links for both SCSI initiator and target.
Sedat Dilek [Fri, 15 Feb 2019 12:19:20 +0000 (13:19 +0100)]
scsi: fcoe: make use of fip_mode enum complete
commit 1917d42d14b7 ("fcoe: use enum for fip_mode") introduces a separate
enum for the fip_mode that shall be used during initialisation handling
until it is passed to fcoe_ctrl_link_up to set the initial fip_state. That
change was incomplete and gcc quietly converted in various places between
the fip_mode and the fip_state enum values with implicit enum conversions,
which fortunately cannot cause any issues in the actual code's execution.
clang however warns about these implicit enum conversions in the scsi
drivers. This commit consolidates the use of the two enums, guided by
clang's enum-conversion warnings.
This commit now completes the use of the fip_mode: It expects and uses
fip_mode in {bnx2fc,fcoe}_interface_create and fcoe_ctlr_init, and it calls
fcoe_ctrl_set_set() with the correct values in fcoe_ctlr_link_up(). It
also breaks the association between FIP_MODE_AUTO and FIP_ST_AUTO to
indicate these two enums are distinct.
scsi: qla2xxx: Avoid PCI IRQ affinity mapping when multiqueue is not supported
This patch fixes warning seen when BLK-MQ is enabled and hardware does not
support MQ. This will result into driver requesting MSIx vectors which are
equal or less than pre_desc via PCI IRQ Affinity infrastructure.
Quinn Tran [Fri, 15 Feb 2019 22:37:19 +0000 (14:37 -0800)]
scsi: qla2xxx: Move marker request behind QPair
Current code hard codes marker request to use request and response queue
0. This patch make use of the qpair as the path to access the
request/response queues. It allows marker to be place on any hardware
queue.
Anil Gurumurthy [Fri, 15 Feb 2019 22:37:17 +0000 (14:37 -0800)]
scsi: qla2xxx: Add support for setting port speed
This patch adds sysfs node
1. There is a new sysfs node port_speed
2. The possible values are 2(Auto neg), 8, 16, 32
3. A value outside of the above defaults to Auto neg
4. Any update to the setting causes a link toggle
5. This feature is currently only for ISP27xx
Himanshu Madhani [Fri, 15 Feb 2019 22:37:12 +0000 (14:37 -0800)]
scsi: qla2xxx: Fix LUN discovery if loop id is not assigned yet by firmware
This patch fixes LUN discovery when loop ID is not yet assigned by the
firmware during driver load/sg_reset operations. Driver will now search for
new loop id before retrying login.
Colin Ian King [Fri, 15 Feb 2019 09:52:32 +0000 (09:52 +0000)]
scsi: qla2xxx: remove redundant null check on pointer sess
The null check on pointer sess and the subsequent call is redundant as sess
is null on all the the paths that lead to the out_term2 label. Hence the
null check and the call can be removed. Also remove the redundant setting
of sess to NULL as this is not required now.
Detected by CoverityScan, CID#1420663 ("Logically dead code")
Bill Kuzeja [Thu, 14 Feb 2019 15:52:29 +0000 (10:52 -0500)]
scsi: qla2xxx: Move debug messages before sending srb preventing panic
When sending an srb with qla2x00_start_sp, the sp can complete and be freed
by the time we log the debug message saying we sent it. This can cause a
panic if sp gets reused quickly or when running a kernel that poisons freed
memory.
This was partially fixed by (not every case was addressed):
Commit 9fe278f44b4b ("scsi: qla2xxx: Move log messages before issuing
command to firmware")
YueHaibing [Thu, 14 Feb 2019 01:51:52 +0000 (01:51 +0000)]
scsi: lpfc: Remove set but not used variable 'phys_id'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/lpfc/lpfc_init.c: In function 'lpfc_cpu_affinity_check':
drivers/scsi/lpfc/lpfc_init.c:10599:19: warning:
variable 'phys_id' set but not used [-Wunused-but-set-variable]
It never used since introduction in commit 6a828b0f6192 ("scsi: lpfc:
Support non-uniform allocation of MSIX vectors to hardware queues")
Add HI3670 SoC UFS driver support by extending the common ufs-hisi
driver. One major difference between HI3660 ad HI3670 SoCs interms of UFS
is the PHY. HI3670 has a 10nm variant PHY and hence this parameter is used
to distinguish the configuration.
Add devicetree binding for HI3670 UFS controller. HI3760 SoC is very
similar to HI3660 SoC with almost same IPs. Only major difference in terms
of UFS is the PHY. HI3670 has 10nm PHY. But since the original driver
(HI3660 UFS) cannot make HI3670 UFS functional, a separate compatible
is added for HI3670 without any fallback.
Dietmar Hahn [Tue, 5 Feb 2019 10:10:48 +0000 (11:10 +0100)]
scsi: sd: Fix typo in sd_first_printk()
Commit b2bff6ceb61a9 ("[SCSI] sd: Quiesce mode sense error messages")
added the macro sd_first_printk(). The macro takes "sdsk" as argument
but dereferences "sdkp". This hasn't caused any real issues since all
callers of sd_first_printk() have an sdkp. But fix the typo.
[mkp: Turned this into a real patch and tweaked commit description]
Shivasharan S [Fri, 8 Feb 2019 08:22:46 +0000 (00:22 -0800)]
scsi: megaraid_sas: Update structures for HOST_DEVICE_LIST DCMD
Add padding to make the structure variables in MR_HOST_DEVICE_LIST_ENTRY
64-bit aligned. Also, add reserved fields to MR_HOST_DEVICE_LIST for
future firmware usage.
Dan Carpenter [Mon, 11 Feb 2019 18:43:00 +0000 (21:43 +0300)]
scsi: lpfc: Fix error code if kcalloc() fails
This should return -ENOMEM if kcalloc() fails, but it accidentally
returns success instead.
Fixes: 6a828b0f6192 ("scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Ewan D. Milne <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
Bart Van Assche [Fri, 8 Feb 2019 21:25:02 +0000 (13:25 -0800)]
scsi: sd: Remove superfluous residual assignments
Since commit 26e85fcd15f6 ("[SCSI] sd: Permit merged discard requests";
kernel v3.10) sd_done() sets the residual not only for failed special
requests but also for special requests that succeeded. Hence remove the
code from functions called by sd_init_command() that sets the residual.
This patch does not change any functionality.
Bart Van Assche [Fri, 8 Feb 2019 21:21:27 +0000 (13:21 -0800)]
scsi: scsi_debug: Fix a recently introduced regression
A recent commit removed an element from opcode_info_arr[] but did not
modify opcode_ind_arr[] nor was SDEB_I_XDWRITEREAD removed. Remove
SDEB_I_XDWRITEREAD and bring the two arrays again in sync. This patch
avoids that the following is reported:
BUG: KASAN: null-ptr-deref in scsi_debug_queuecommand+0x60f/0xc90 [scsi_debug]
Read of size 1 at addr 0000000000000001 by task iscsi-test-cu/683
CPU: 3 PID: 683 Comm: iscsi-test-cu Not tainted 5.0.0-rc5-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
dump_stack+0x86/0xca
kasan_report.cold.3+0x5/0x3e
__asan_load1+0x47/0x50
scsi_debug_queuecommand+0x60f/0xc90 [scsi_debug]
scsi_queue_rq+0xc17/0x12e0
blk_mq_dispatch_rq_list+0x5fc/0xb10
blk_mq_sched_dispatch_requests+0x2f7/0x300
__blk_mq_run_hw_queue+0xd6/0x180
__blk_mq_delay_run_hw_queue+0x25c/0x290
blk_mq_run_hw_queue+0x119/0x1b0
blk_mq_sched_insert_request+0x274/0x350
blk_execute_rq_nowait+0x78/0x90
blk_execute_rq+0xcc/0x140
sg_io+0x30f/0x700
scsi_cmd_ioctl+0x4d4/0x540
scsi_cmd_blk_ioctl+0x7b/0x8b
sd_ioctl+0xba/0x150
blkdev_ioctl+0x6e1/0xea0
block_ioctl+0x79/0x90
do_vfs_ioctl+0x12b/0x9b0
ksys_ioctl+0x41/0x80
__x64_sys_ioctl+0x43/0x50
do_syscall_64+0x71/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Xiang Chen [Wed, 6 Feb 2019 10:52:55 +0000 (18:52 +0800)]
scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental
For auto-control irq affinity mode, choose the dq to deliver IO according
to the current CPU.
Then it decreases the performance regression that fio and CQ interrupts are
processed on different node.
For user control irq affinity mode, keep it as before.
To realize it, also need to distinguish the usage of dq lock and sas_dev
lock.
We mark as experimental due to ongoing discussion on managed MSI IRQ
during hotplug:
https://marc.info/?l=linux-scsi&m=154876335707751&w=2
We're almost at the point where we can expose multiple queues to the upper
layer for SCSI MQ, but we need to sort out the per-HBA tags performance
issue.
John Garry [Wed, 6 Feb 2019 10:52:54 +0000 (18:52 +0800)]
scsi: hisi_sas: Issue internal abort on all relevant queues
To support queue mapped to a CPU, it needs to be ensured that issuing an
internal abort is safe, in that it is guaranteed that an internal abort is
processed for a single IO or a device after all the relevant command(s)
which it is attempting to abort have been processed by the controller.
Currently we only deliver commands for any device on a single queue to
solve this problem, as we know that commands issued on the same queue will
be processed in order, and we will not have a scenario where the internal
abort is racing against a command(s) which it is trying to abort.
To enqueue commands on queue mapped to a CPU, choosing a queue for an
command is based on the associated queue for the current CPU, so this is
not safe for internal abort since it would definitely not be guaranteed
that commands for the command devices are issued on the same queue.
To solve this issue, we take a bludgeoning approach, and issue a separate
internal abort on any queue(s) relevant to the command or device, in that
we will be guaranteed that at least one of these internal aborts will be
received last in the controller.
So, for aborting a single command, we can just force the internal abort to
be issued on the same queue as the command which we are trying to abort.
For aborting all commands associated with a device, we issue a separate
internal abort on all relevant queues. Issuing multiple internal aborts in
this fashion would have not side affect.
Xiang Chen [Wed, 6 Feb 2019 10:52:53 +0000 (18:52 +0800)]
scsi: hisi_sas: change queue depth from 512 to 4096
If sending IOs to many disks from single queue, it is possible that the
queue may be full. To avoid the situation, change queue depth from 512 to
4096 which is the max number of IOs for v3 hw.
Douglas Anderson [Fri, 12 Oct 2018 21:39:26 +0000 (14:39 -0700)]
scsi: dt-bindings: ufs: Fix the compatible string definition
If you look at the bindings for the UFS Host Controller it says:
- compatible: must contain "jedec,ufs-1.1" or "jedec,ufs-2.0", may
also list one or more of the following:
"qcom,msm8994-ufshc"
"qcom,msm8996-ufshc"
"qcom,ufshc"
My reading of that is that it's fine to just have either of these:
1. "qcom,msm8996-ufshc", "jedec,ufs-2.0"
2. "qcom,ufshc", "jedec,ufs-2.0"
As far as I can tell neither of the above is actually a good idea.
For #1 it turns out that the driver currently only keys off the
compatible string "qcom,ufshc" so it won't actually probe.
For #2 the driver won't probe but it's not a good idea to keep the SoC
name out of the compatible string.
Let's update the compatible string to make it really explicit. We'll
include a nod to the existing driver and the old binding and say that
we should always include the "qcom,ufshc" string in addition to the
SoC compatible string.
While we're at it we'll also include another example SoC known to have
UFS: sdm845.
scsi: ata: Use unsigned int for cmd's type in ioctls in scsi_host_template
Clang warns several times in the scsi subsystem (trimmed for brevity):
drivers/scsi/hpsa.c:6209:7: warning: overflow converting case value to
switch condition type (2147762695 to 18446744071562347015) [-Wswitch]
case CCISS_GETBUSTYPES:
^
drivers/scsi/hpsa.c:6208:7: warning: overflow converting case value to
switch condition type (2147762694 to 18446744071562347014) [-Wswitch]
case CCISS_GETHEARTBEAT:
^
The root cause is that the _IOC macro can generate really large numbers,
which don't fit into type 'int', which is used for the cmd parameter in
the ioctls in scsi_host_template. My research into how GCC and Clang are
handling this at a low level didn't prove fruitful. However, looking at
the rest of the kernel tree, all ioctls use an 'unsigned int' for the
cmd parameter, which will fit all of the _IOC values in the scsi/ata
subsystems.
Make that change because none of the ioctls expect a negative value for
any command, it brings the ioctls inline with the reset of the kernel,
and it removes ambiguity, which is never good when dealing with compilers.
James Smart [Mon, 28 Jan 2019 19:14:40 +0000 (11:14 -0800)]
scsi: lpfc: Fix nvmet issues when link bounce under IO load
Various null pointer dereference and general protection fault panics occur
when there is a link bounce under load. There are a large number of "error"
message 6413 indicating "bad release".
The issues resolve to list corruptions due to missing or inconsistent lock
protection. Lockups are due to nested locks in the unsolicited abort
path. The unsolicited abort path calls the wrong abort processing
routine. There was also duplicate context release while aborts were still
active in the hardware.
Removed duplicate locks and added lock protection around list item
removal. Commonized lock handling around the abort processing routines.
Prevent context release while still in ABTS list.
James Smart [Mon, 28 Jan 2019 19:14:39 +0000 (11:14 -0800)]
scsi: lpfc: Correct upcalling nvmet_fc transport during io done downcall
When the transport calls into the lpfc target to release an IO job
structure, which corresponds to an exchange, and if the driver was waiting
for an exchange in order to post a previously received command to the
transport, the driver immediately takes the IO job and reuses the context
for the prior command and calls nvmet_fc_rcv_fcp_req() to tell the
transport about a newly received command.
Problem is, the execution of the IO job release may be in the context of
the back end driver and its bio completion handlers, thus it may be in a
irq context and protection code kicks in in the bio and request layers that
are subsequently called.
Rework lpfc so that instead of immediately upcalling, queue it to a
deferred work thread and have the thread make the upcall.
Took advantage of this change to remove duplicated code with the normal
command receive path that preps the IO job and upcalls nvmet_fc. Created a
common routine both paths use.
Also corrected some errors that were found during review of the context
freeing and reuse - basically unlocked operations and a somewhat disjoint
set of calls to release associated job elements. Cleaned up this path and
added locks for coherency.
James Smart [Mon, 28 Jan 2019 19:14:38 +0000 (11:14 -0800)]
scsi: lpfc: Fix default driver parameter collision for allowing NPIV support
The conversion to enable SCSI and NVME fc4 support ran into an issue with
NPIV support. With NVME, NPIV is not currently supported, but with SCSI it
was. The driver reverted to its lowest setting meaning NPIV with SCSI was
not allowed.
Convert the NPIV checks and implementation so that SCSI can continue to
allow NPIV support.
James Smart [Mon, 28 Jan 2019 19:14:37 +0000 (11:14 -0800)]
scsi: lpfc: Rework locking on SCSI io completion
A scsi host lock is taken on every io completion to check whether the abort
handler is waiting on the io completion. This is an expensive lock to take
on all completion when rarely in an abort condition.
Replace scsi host lock with command-specific lock. Synchronize completion
and abort paths by new cmd lock. Ensure all flag changing and nulling of
context pointers taken under lock. When adding lock to task management
abort, realized it was missing other synchronization locks. Added that
synchronization to match normal paths.
James Smart [Mon, 28 Jan 2019 19:14:36 +0000 (11:14 -0800)]
scsi: lpfc: Enable SCSI and NVME fc4s by default
Now that performance mods don't split resources by protocol and enable both
protocols by default, there's no reason not to enable concurrent SCSI and
NVME fc4 support.
James Smart [Mon, 28 Jan 2019 19:14:35 +0000 (11:14 -0800)]
scsi: lpfc: Resize cpu maps structures based on possible cpus
The work done to date utilized the number of present cpus when sizing
per-cpu structures. Structures should have been sized based on the max
possible cpu count.
Convert the driver over to possible cpu count for sizing allocation.
James Smart [Mon, 28 Jan 2019 19:14:33 +0000 (11:14 -0800)]
scsi: lpfc: Rework EQ/CQ processing to address interrupt coalescing
When driving high iop counts, auto_imax coalescing kicks in and drives the
performance to extremely small iops levels.
There are two issues:
1) auto_imax is enabled by default. The auto algorithm, when iops gets
high, divides the iops by the hdwq count and uses that value to
calculate EQ_Delay. The EQ_Delay is set uniformly on all EQs whether
they have load or not. The EQ_delay is only manipulated every 5s (a
long time). Thus there were large 5s swings of no interrupt delay
followed by large/maximum delay, before repeating.
2) When processing a CQ, the driver got mixed up on the rate of when
to ring the doorbell to keep the chip appraised of the eqe or cqe
consumption as well as how how long to sit in the thread and
process queue entries. Currently, the driver capped its work at
64 entries (very small) and exited/rearmed the CQ. Thus, on heavy
loads, additional overheads were taken to exit and re-enter the
interrupt handler. Worse, if in the large/maximum coalescing
windows,k it could be a while before getting back to servicing.
The issues are corrected by the following:
- A change in defaults. Auto_imax is turned OFF and fcp_imax is set
to 0. Thus all interrupts are immediate.
- Cleanup of field names and their meanings. Existing names were
non-intuitive or used for duplicate things.
- Added max_proc_limit field, to control the length of time the
handlers would service completions.
- Reworked EQ handling:
Added common routine that walks eq, applying notify interval and max
processing limits. Use queue_claimed to claim ownership of the queue
while processing. Always rearm the queue whenever the common routine
is called.
Rework queue element processing, namely to eliminate hba_index vs
host_index. Only one index is necessary. The queue entry can be
marked invalid and the host_index updated immediately after eqe
processing.
After rework, xx_release routines are now DB write functions. Renamed
the routines as such.
Moved lpfc_sli4_eq_flush(), which does similar action, to same area.
Replaced the 2 individual loops that walk an eq with a call to the
common routine.
Slightly revised lpfc_sli4_hba_handle_eqe() calling syntax.
Added per-cpu counters to detect interrupt rates and scale
interrupt coalescing values.
- Reworked CQ handling:
Added common routine that walks cq, applying notify interval and max
processing limits. Use queue_claimed to claim ownership of the queue
while processing. Always rearm the queue whenever the common routine
is called.
Rework queue element processing, namely to eliminate hba_index vs
host_index. Only one index is necessary. The queue entry can be
marked invalid and the host_index updated immediately after cqe
processing.
After rework, xx_release routines are now DB write functions. Renamed
the routines as such.
Replaced the 3 individual loops that walk a cq with a call to the
common routine.
Redefined lpfc_sli4_sp_handle_mcqe() to commong handler definition with
queue reference. Add increment for mbox completion to handler.
- Added a new module/sysfs attribute: lpfc_cq_max_proc_limit To allow
dynamic changing of the CQ max_proc_limit value being used.
Although this leaves an EQ as an immediate interrupt, that interrupt will
only occur if a CQ bound to it is in an armed state and has cqe's to
process. By staying in the cq processing routine longer, high loads will
avoid generating more interrupts as they will only rearm as the processing
thread exits. The immediately interrupt is also beneficial to idle or
lower-processing CQ's as they get serviced immediately without being
penalized by sharing an EQ with a more loaded CQ.
James Smart [Mon, 28 Jan 2019 19:14:32 +0000 (11:14 -0800)]
scsi: lpfc: cleanup: convert eq_delay to usdelay
Review of the eq coalescing logic showed the code was a bit fragmented.
Sometimes it would save/set via an interrupt max value, while in others it
would do so via a usdelay. There were also two places changing eq delay,
one place that issued mailbox commands, and another that changed via
register writes if supported.
Clean this up by:
- Standardizing the operation of lpfc_modify_hba_eq_delay() routine so
that it is always told of a us delay to impose. The routine then chooses
the best way to set that - via register or via mbx.
- Rather than two value types stored in eq->q_mode (usdelay if change via
register, imax if change via mbox) - q_mode always contains usdelay.
Before any value change, old vs new value is compared and only if
different is a change done.
- Revised the dmult calculation. dmult is not set based on overall imax
divided by hardware queues - instead imax applies to a single cpu and
the value will be replicated to all cpus.
James Smart [Mon, 28 Jan 2019 19:14:31 +0000 (11:14 -0800)]
scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues
So far MSIX vector allocation assumed it would be 1:1 with hardware
queues. However, there are several reasons why fewer MSIX vectors may be
allocated than hardware queues such as the platform being out of vectors or
adapter limits being less than cpu count.
This patch reworks the MSIX/EQ relationships with the per-cpu hardware
queues so they can function independently. MSIX vectors will be equitably
split been cpu sockets/cores and then the per-cpu hardware queues will be
mapped to the vectors most efficient for them.
James Smart [Mon, 28 Jan 2019 19:14:30 +0000 (11:14 -0800)]
scsi: lpfc: Fix setting affinity hints to correlate with hardware queues
The desired affinity for the hardware queue behavior is for hdwq 0 to be
affinitized with cpu 0, hdwq 1 to cpu 1, and so on. The implementation so
far does not do this if the number of cpus is greater than the number of
hardware queues (e.g. hardware queue allocation was administratively
reduced or hardware queue resources could not scale to the cpu count).
Correct the queue affinitization logic when queue count is less than
cpu count.
James Smart [Mon, 28 Jan 2019 19:14:29 +0000 (11:14 -0800)]
scsi: lpfc: Allow override of hardware queue selection policies
Default behavior is to use the information from the upper IO stacks to
select the hardware queue to use for IO submission. Which typically has
good cpu affinity.
However, the driver, when used on some variants of the upstream kernel, has
found queuing information to be suboptimal for FCP or IO completion locked
on particular cpus.
For command submission situations, the lpfc_fcp_io_sched module parameter
can be set to specify a hardware queue selection policy that overrides the
os stack information.
For IO completion situations, rather than queing cq processing based on the
cpu servicing the interrupting event, schedule the cq processing on the cpu
associated with the hardware queue's cq.
James Smart [Mon, 28 Jan 2019 19:14:28 +0000 (11:14 -0800)]
scsi: lpfc: Adapt partitioned XRI lists to efficient sharing
The XRI get/put lists were partitioned per hardware queue. However, the
adapter rarely had sufficient resources to give a large number of resources
per queue. As such, it became common for a cpu to encounter a lack of XRI
resource and request the upper io stack to retry after returning a BUSY
condition. This occurred even though other cpus were idle and not using
their resources.
Create as efficient a scheme as possible to move resources to the cpus that
need them. Each cpu maintains a small private pool which it allocates from
for io. There is a watermark that the cpu attempts to keep in the private
pool. The private pool, when empty, pulls from a global pool from the
cpu. When the cpu's global pool is empty it will pull from other cpu's
global pool. As there many cpu global pools (1 per cpu or hardware queue
count) and as each cpu selects what cpu to pull from at different rates and
at different times, it creates a radomizing effect that minimizes the
number of cpu's that will contend with each other when the steal XRI's from
another cpu's global pool.
On io completion, a cpu will push the XRI back on to its private pool. A
watermark level is maintained for the private pool such that when it is
exceeded it will move XRI's to the CPU global pool so that other cpu's may
allocate them.
On NVME, as heartbeat commands are critical to get placed on the wire, a
single expedite pool is maintained. When a heartbeat is to be sent, it will
allocate an XRI from the expedite pool rather than the normal cpu
private/global pools. On any io completion, if a reduction in the expedite
pools is seen, it will be replenished before the XRI is placed on the cpu
private pool.
Statistics are added to aid understanding the XRI levels on each cpu and
their behaviors.
James Smart [Mon, 28 Jan 2019 19:14:26 +0000 (11:14 -0800)]
scsi: lpfc: Convert ring number to hardware queue for nvme wqe posting.
SLI4 nvme functions are passing the SLI3 ring number when posting wqe to
hardware. This should be indicating the hardware queue to use, not the ring
number.
Replace ring number with the hardware queue that should be used.
Note: SCSI avoided this issue as it utilized an older lfpc_issue_iocb
routine that properly adapts.
James Smart [Mon, 28 Jan 2019 19:14:25 +0000 (11:14 -0800)]
scsi: lpfc: Move SCSI and NVME Stats to hardware queue structures
Many io statistics were being sampled and saved using adapter-based data
structures. This was creating a lot of contention and cache thrashing in
the I/O path.
Move the statistics to the hardware queue data structures. Given the
per-queue data structures, use of atomic types is lessened.
Add new sysfs and debugfs stat routines to collate the per hardware queue
values and report at an adapter level.
James Smart [Mon, 28 Jan 2019 19:14:24 +0000 (11:14 -0800)]
scsi: lpfc: Adapt cpucheck debugfs logic to Hardware Queues
Similar to the io execution path that reports cpu context information, the
debugfs routines for cpu information needs to be aligned with new hardware
queue implementation.
Convert debugfs cnd nvme cpucheck statistics to report information per
Hardware Queue.
James Smart [Mon, 28 Jan 2019 19:14:22 +0000 (11:14 -0800)]
scsi: lpfc: Partition XRI buffer list across Hardware Queues
Once the IO buff allocations were made shared, there was a single XRI
buffer list shared by all hardware queues. A single list isn't great for
performance when shared across the per-cpu hardware queues.
Create a separate XRI IO buffer get/put list for each Hardware Queue. As
SGLs and associated IO buffers get allocated/posted to the firmware; round
robin their assignment across all available hardware Queues so that there
is an equitable assignment.
Modify SCSI and NVME IO submit code paths to use the Hardware Queue logic
for XRI allocation.
Add a debugfs interface to display hardware queue statistics
Added new empty_io_bufs counter to track if a cpu runs out of XRIs.
Replace common_ variables/names with io_ to make meanings clearer.
James Smart [Mon, 28 Jan 2019 19:14:21 +0000 (11:14 -0800)]
scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu
Currently, both nvme and fcp each have their own concept of an io_channel,
which is a combination wq/cq and associated msix. Different cpus would
share an io_channel.
The driver is now moving to per-cpu wq/cq pairs and msix vectors. The
driver will still use separate wq/cq pairs per protocol on each cpu, but
the protocols will share the msix vector.
Given the elimination of the nvme and fcp io channels, the module
parameters will be removed. A new parameter, lpfc_hdw_queue is added which
allows the wq/cq pair allocation per cpu to be overridden and allocated to
lesser value. If lpfc_hdw_queue is zero, the number of pairs allocated will
be based on the number of cpus. If non-zero, the parameter specifies the
number of queues to allocate. At this time, the maximum non-zero value is
64.
To manage this new paradigm, a new hardware queue structure is created to
track queue activity and relationships.
As MSIX vector allocation must be known before setting up the
relationships, msix allocation now occurs before queue datastructures are
allocated. If the number of vectors allocated is less than the desired
hardware queues, the hardware queue counts will be reduced to the number of
vectors