Git Repo - linux.git/log

scsi: hisi_sas: Set PHY linkrate when disconnected

When the PHY comes down, we currently do not set the negotiated linkrate:

root@(none)$ pwd
/sys/class/sas_phy/phy-0:0
root@(none)$ more enable
1
root@(none)$ more negotiated_linkrate
12.0 Gbit
root@(none)$ echo 0 > enable
root@(none)$ more negotiated_linkrate
12.0 Gbit
root@(none)$

This patch fixes the driver code to set it properly when the PHY comes
down.

If the PHY had been enabled, then set unknown; otherwise, flag as disabled.

The logical place to set the negotiated linkrate for this scenario is PHY
down routine, which is called from the PHY down ISR.

However, it is not possible to know if the PHY comes down due to PHY
disable or loss of link, as sas_phy.enabled member is not set until after
the transport disable routine is complete, which races with the PHY down
ISR.

As an imperfect solution, use sas_phy_data.enable as the flag to know if
the PHY is down due to disable. It's imperfect, as sas_phy_data is internal
to libsas.

I can't see another way without adding a new field to hisi_sas_phy and
managing it, or changing SCSI SAS transport.

Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hisi_sas: print PHY RX errors count for later revision of v3 hw

The later revision of v3 hw has added an function of interrupt coalesce
according to time for PHY RX errors. We set the coalesce time to 1s. Then
we print PHY RX errors count when PHY RX errors happen, and don't need to
worry that there may be too much log prints.

Besides, we use hisi_sas_phy.lock to protect error count value. Because we
update them by calling phy_get_events_v3_hw(), which is also used by core
driver (for get PHY events function).

We relocate phy_get_events_v3_hw() to avoid a further declaration.

Signed-off-by: Xiaofei Tan <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO

For internal IO and SMP IO, there is a time-out timer for them. In the
timer handler, it checks whether IO is done according to the flag
task->task_state_lock.

There is an issue which may cause system suspended: internal IO or SMP IO
is sent, but at that time because of hardware exception (such as inject
2Bit ECC error), so IO is not completed and also not timeout. But, at that
time, the SAS controller reset occurs to recover system. It will release
the resource and set the status of IO to be SAS_TASK_STATE_DONE, so when IO
timeout, it will never complete the completion of IO and wait for ever.

[  729.123632] Call trace:
[  729.126791] [<ffff00000808655c>] __switch_to+0x94/0xa8
[  729.133106] [<ffff000008d96e98>] __schedule+0x1e8/0x7fc
[  729.138975] [<ffff000008d974e0>] schedule+0x34/0x8c
[  729.144401] [<ffff000008d9b000>] schedule_timeout+0x1d8/0x3cc
[  729.150690] [<ffff000008d98218>] wait_for_common+0xdc/0x1a0
[  729.157101] [<ffff000008d98304>] wait_for_completion+0x28/0x34
[  729.165973] [<ffff000000dcefb4>] hisi_sas_internal_task_abort+0x2a0/0x424 [hisi_sas_test_main]
[  729.176447] [<ffff000000dd18f4>] hisi_sas_abort_task+0x244/0x2d8 [hisi_sas_test_main]
[  729.185258] [<ffff000008971714>] sas_eh_handle_sas_errors+0x1c8/0x7b8
[  729.192391] [<ffff000008972774>] sas_scsi_recover_host+0x130/0x398
[  729.199237] [<ffff00000894d8a8>] scsi_error_handler+0x148/0x5c0
[  729.206009] [<ffff0000080f4118>] kthread+0x10c/0x138
[  729.211563] [<ffff0000080855dc>] ret_from_fork+0x10/0x18

To solve the issue, callback function task_done of those IOs need to be
called when on SAS controller reset.

Signed-off-by: Xiang Chen <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hisi_sas: Change return variable type in phy_up_v3_hw()

According to the tool fortify, phy_up_v3_hw() returns signed value, while
it should return an unsigned value.

So change variable "res" from int to irq_return_t.

Signed-off-by: Xiang Chen <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: check for kstrtol() failure

The error handling was unintentionally left out so it introduces a Smatch
static checker warning:

drivers/scsi/qla2xxx/qla_attr.c:1655 qla2x00_port_speed_store()
error: uninitialized symbol 'type'.

Fixes: a7b9ca7fc87a ("scsi: qla2xxx: Add support for setting port speed")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: fix 32-bit format string warning

On 32-bit architectures, we see a warning when %ld is used to print a
size_t:

In file included from drivers/scsi/lpfc/lpfc_init.c:62:
drivers/scsi/lpfc/lpfc_init.c: In function 'lpfc_new_io_buf':
drivers/scsi/lpfc/lpfc_logmsg.h:62:45: error: format '%ld' expects argument of type 'long int', but argument 5 has type 'unsigned int' [-Werror=format=]

This is harmless, but portable code should just use %zd to avoid the
warning.

Fixes: 0794d601d174 ("scsi: lpfc: Implement common IO buffers between NVME and SCSI")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: fix unused variable warning

The newly introduced 'cpu' variable is only used inside of an optional
block, so we get a warning without CONFIG_SCSI_LPFC_DEBUG_FS:

drivers/scsi/lpfc/lpfc_nvme.c: In function 'lpfc_nvme_io_cmd_wqe_cmpl':
drivers/scsi/lpfc/lpfc_nvme.c:968:30: error: unused variable 'cpu' [-Werror=unused-variable]
uint32_t code, status, idx, cpu;

Move the declaration into the same block to avoid the warning.

Fixes: 63df6d637e33 ("scsi: lpfc: Adapt cpucheck debugfs logic to Hardware Queues")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: target: tcmu: Switch to bitmap_zalloc()

Switch to bitmap_zalloc() to show clearly what we are allocating. Besides
that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Acked-by: Mike Christie <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: libiscsi: fall back to sendmsg for slab pages

In "XFS over network block device" scenario XFS can create IO requests with
slab-based XFS metadata. During processing such requests tcp_sendpage() can
merge skb fragments with neighbour slab objects.

If receiving side is located on the same host tcp_recvmsg() can trigger
BUG_ON in hardening check and crash the host with following message:

usercopy: kernel memory exposure attempt detected
from XXXXXXXX (kmalloc-512) (1024 bytes)

This patch redirect such requests from sednpage to sendmsg path. The
problem is similar to one described in recent commit 7e241f647dc7
("libceph: fall back to sendmsg for slab pages")

Signed-off-by: Vasily Averin <[email protected]>
Acked-by: Chris Leech <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: avoid printf format warning

Depending on the target architecture and configuration, both phys_addr_t
and dma_addr_t may be smaller than 'long long', so we get a warning when
printing either of them using the %llx format string:

drivers/scsi/qla2xxx/qla_iocb.c: In function 'qla24xx_walk_and_build_prot_sglist':
drivers/scsi/qla2xxx/qla_iocb.c:1140:46: error: format '%llx' expects argument of type 'long long unsigned int', but argument 6 has type 'dma_addr_t' {aka 'unsigned int'} [-Werror=format=]
         "%s: page boundary crossing (phys=%llx len=%x)\n",
                                           ~~~^
                                           %x
         __func__, sle_phys, sg->length);
                   ~~~~~~~~
drivers/scsi/qla2xxx/qla_iocb.c:1180:29: error: format '%llx' expects argument of type 'long long unsigned int', but argument 7 has type 'dma_addr_t' {aka 'unsigned int'} [-Werror=format=]
        "%s: sg[%x] (phys=%llx sglen=%x) ldma_sg_len: %x dif_bundl_len: %x ldma_needed: %x\n",
                          ~~~^

There are special %pad and %pap format strings in Linux that we could use
here, but since the driver already does 64-bit arithmetic on the values,
using a plain 'u64' seems more consistent here.

Note: A possible related issue may be that the driver possibly checks the
wrong kind of overflow: when an IOMMU is in use, buffers that cross a
32-bit boundary in physical addresses would still be mapped into dma
addresses within the low 4GB space, so I suspect that we actually want to
check sg_dma_address() instead of sg_phys() here.

Fixes: 50b812755e97 ("scsi: qla2xxx: Fix DMA error when the DIF sg buffer crosses 4GB boundary")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: resolve static checker warning in lpfc_sli4_hba_unset

The patch that replaced io channels for hdw_queues now reports the
following static checker warning:

drivers/scsi/lpfc/lpfc_init.c:11136 lpfc_sli4_hba_unset()
error: we previously assumed 'phba->pport' could be null (see line 11074)

Resolve by adding a pport NULL check.

[mkp: tag tweak]

Fixes: cdb42becdd40 ("scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu"_
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Correct __lpfc_sli_issue_iocb_s4 lockdep check

The outer routine lpfc_sli_issue_iocb(), which decomposes into the
SLI3 (s3) or SLI4 (s4) subroutines takes out the locks. For s3, it takes
out the hbalock. For s4, it takes out the ring_lock. The lockdep check in
the s3 and s4 subroutines both check hbalock, which is incorrect for s4.

Revise the s4 subroutine to lockdep check the ring_lock.

Reported-by: Bart Van Assche <[email protected]>
Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: ufs: hisi: fix ufs_hba_variant_ops passing

Without CONFIG_OF, the of_match_node() helper does not evaluate its
argument, and the compiler warns about the unused variable:

drivers/scsi/ufs/ufs-hisi.c: In function 'ufs_hisi_probe':
drivers/scsi/ufs/ufs-hisi.c:673:17: error: unused variable 'dev' [-Werror=unused-variable]

Rework this code to pass the data directly, and while we're at it,
correctly handle the const pointers.

Fixes: 653fcb07d95e ("scsi: ufs: Add HI3670 SoC UFS driver support")
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Avri Altman <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Fix panic in qla_dfs_tgt_counters_show

When trying to display tgt_counters in the debugfs, a panic can result.

There is no null check for qpair after it is assigned in the for-loop.
Unless vha->hw->queue_pair_map array is completely filled with entries, the
system will panic dereferencing a null pointer.

Signed-off-by: Bill Kuzeja <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: megaraid_sas: reduce module load time

megaraid_sas takes 1+ seconds to load while waiting for firmware:

[2.822603] megaraid_sas 0000:03:00.0: Waiting for FW to come to ready state
[3.871003] megaraid_sas 0000:03:00.0: FW now in Ready state

This is due to the following loop in megasas_transition_to_ready(), which
waits a minimum of 1 second, even though the FW becomes ready in tens of
millisecs:

        /*
         * The cur_state should not last for more than max_wait secs
         */
        for (i = 0; i < max_wait; i++) {
                ...
                msleep(1000);
        ...
        dev_info(&instance->pdev->dev, "FW now in Ready state\n");

This is a regression, caused by a change of the msleep granularity from 1
to 1000 due to concern about waiting too long on systems with coarse
jiffies.

To fix, increase iterations and use msleep(20), which results in:

[2.670627] megaraid_sas 0000:03:00.0: Waiting for FW to come to ready state
[2.739386] megaraid_sas 0000:03:00.0: FW now in Ready state

Fixes: fb2f3e96d80f ("scsi: megaraid_sas: Fix msleep granularity")
Signed-off-by: Steve Sistare <[email protected]>
Acked-by: Sumit Saxena <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: target: tcmu: wait for nl reply only if there are listeners or during an add

genlmsg_multicast_allns now returns the correct statuses when a message is
sent to a listener. However in the case of adding a device we want to wait
for the listener otherwise we may miss the the device during startup.

Signed-off-by: Cathy Avery <[email protected]>
Acked-by: Mike Christie <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: virtio_scsi: don't send sc payload with tmfs

The virtio scsi spec defines struct virtio_scsi_ctrl_tmf as a set of
device-readable records and a single device-writable response entry:

    struct virtio_scsi_ctrl_tmf
    {
        // Device-readable part
        le32 type;
        le32 subtype;
        u8 lun[8];
        le64 id;
        // Device-writable part
        u8 response;
    }

The above should be organised as two descriptor entries (or potentially
more if using VIRTIO_F_ANY_LAYOUT), but without any extra data after "le64
id" or after "u8 response".

The Linux driver doesn't respect that, with virtscsi_abort() and
virtscsi_device_reset() setting cmd->sc before calling virtscsi_tmf().  It
results in the original scsi command payload (or writable buffers) added to
the tmf.

This fixes the problem by leaving cmd->sc zeroed out, which makes
virtscsi_kick_cmd() add the tmf to the control vq without any payload.

Cc: [email protected]
Signed-off-by: Felipe Franciosi <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: smartpqi: Reporting 'logical unit failure'

When the HARDWARE_ERROR/0x3e/0x1 case is triggered, the logical volume
is offlined. When reading the kernel log, the reason why the device
got offlined isn't reported to the user. This situation makes it
difficult for admins to root cause.

Log a message when this condition occurs.

[mkp: tweaked commit message]

Signed-off-by: Erwan Velu <[email protected]>
Acked-by: Don Brace <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: cxgb4i: validate tcp sequence number only if chip version <= T5

T6 adapters generates DDP completion message on receiving all iSCSI pdus in
a sequence. Because of this, driver can not keep track of tcp sequence
number for T6 adapters.

Signed-off-by: Varun Prakash <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: cxgb4i: get pf number from lldi->pf

Instead of using viid to get pf number, directly get pf number from
lldi->pf.

Signed-off-by: Varun Prakash <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c

We had a test-report where, under memory pressure, adding LUNs to the
systems would fail (the tests add LUNs strictly in sequence):

[ 5525.853432] scsi 0:0:1:1088045124: Direct-Access     IBM      2107900          .148 PQ: 0 ANSI: 5
[ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
[ 5525.853830] scsi 0:0:1:1088045124: alua: device naa.6005076303ffd32700000000000044da port group 0 rel port 43
[ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
[ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
[ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
[ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
[ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 5525.857838]  sdk: sdk1
[ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
[ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
[ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
there seems to be no reason to use GFP_ATMOIC here. All the different
call-contexts use a mutex at some point, and nothing in between that
requires no sleeping, as far as I could see. Additionally, the code that
later allocates the block queue for the device (scsi_mq_alloc_queue())
already uses GFP_KERNEL.

There are similar allocations in two other functions:
scsi_probe_and_add_lun(), and scsi_add_lun(),; that can also be done with
GFP_KERNEL.

Here is the contexts for the three functions so far:

    scsi_alloc_sdev()
        scsi_probe_and_add_lun()
            scsi_sequential_lun_scan()
                __scsi_scan_target()
                    scsi_scan_target()
                        mutex_lock()
                    scsi_scan_channel()
                        scsi_scan_host_selected()
                            mutex_lock()
            scsi_report_lun_scan()
                __scsi_scan_target()
                 ...
            __scsi_add_device()
                mutex_lock()
            __scsi_scan_target()
                ...
        scsi_report_lun_scan()
            ...
        scsi_get_host_dev()
            mutex_lock()

    scsi_probe_and_add_lun()
        ...

    scsi_add_lun()
        scsi_probe_and_add_lun()
            ...

So replace all these, and give them a bit of a better chance to succeed,
with more chances of reclaim.

Signed-off-by: Benjamin Block <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: mpt3sas: Add missing breaks in switch statements

Fix the following warnings by adding the proper missing breaks:

drivers/scsi/mpt3sas/mpt3sas_base.c: In function  _base_display_OEMs_branding :
drivers/scsi/mpt3sas/mpt3sas_base.c:3548:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
    switch (ioc->pdev->subsystem_device) {
    ^~~~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3566:3: note: here
   case MPI2_MFGPAGE_DEVID_SAS2308_2:
   ^~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3567:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
    switch (ioc->pdev->subsystem_device) {
    ^~~~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3601:3: note: here
   case MPI25_MFGPAGE_DEVID_SAS3008:
   ^~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3735:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
    switch (ioc->pdev->subsystem_device) {
    ^~~~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3745:3: note: here
   case MPI2_MFGPAGE_DEVID_SAS2308_2:
   ^~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3746:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
    switch (ioc->pdev->subsystem_device) {
    ^~~~~~
drivers/scsi/mpt3sas/mpt3sas_base.c:3768:3: note: here
   default:
   ^~~~~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: aacraid: Fix missing break in switch statement

Add missing break statement and fix identation issue.

This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.

Fixes: 9cb62fa24e0d ("aacraid: Log firmware AIF messages")
Cc: [email protected]
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: kill command serial number

No users left, kill it.

Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: csiostor: drop serial_number usage

Use request tag instead of the serial number when printing out logging
messages.

Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: mvumi: use request tag instead of serial_number

Use the request tag for logging instead of the scsi command serial number.

Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: dpt_i2o: remove serial number usage

Drop references to scsi_cmnd->serial_number.

Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: st: osst: Remove negative constant left-shifts

Negative constant left-shift is undefined behaviour in the C standard, and
as such newer versions of clang (at least) warn against it. GCC supports it
for a long time, but it would be better to remove it and rely on defined
behaviour.

My understanding is "~(-1 << N)" in 2's complement is intended to generate
a bit pattern of zeroes ending with N '1' bits. The same can be achieved by
"(1 << N) - 1" in a well-defined way, so switch to it to remove the
warning.

Tested: building a kernel with generic SCSI tape, and checking basic
operations (mt status, mt eject) on a real LTO unit. Cannot test the osst
driver.

Signed-off-by: Iustin Pop <[email protected]>
Reviewed-by: Lee Duncan <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: ufs-bsg: Allow reading descriptors

Add this functionality, placing the descriptor being read in the actual
data buffer in the bio.

That is, for both read and write descriptors query upiu, we are using the
job's request_payload. This in turn, is mapped back in user land to the
applicable sg_io_v4 xferp: dout_xferp for write descriptor, and din_xferp
for read descriptor.

Signed-off-by: Avri Altman <[email protected]>
Reviewed-by: Evan Green <[email protected]>
Reviewed-by: Bean Huo <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: ufs: Allow reading descriptor via raw upiu

Allow to read descriptors via raw upiu. This in fact was forbidden just as
a precaution, as ufs-bsg actually enforces which functionality is
supported.

Signed-off-by: Avri Altman <[email protected]>
Reviewed-by: Evan Green <[email protected]>
Reviewed-by: Bean Huo <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: ufs-bsg: Change the calling convention for write descriptor

When we had a write descriptor query upiu, we appended the descriptor right
after the bsg request. This was fine as the bsg driver allows to allocate
whatever buffer we needed in its job request.

Still, the proper way to deliver payload, however small (we only write
config descriptors of 144 bytes), is by using the job request payload data
buffer.

So change this ABI now, while ufs-bsg is still new, and nobody is actually
using it.

Signed-off-by: Avri Altman <[email protected]>
Reviewed-by: Evan Green <[email protected]>
Reviewed-by: Bean Huo <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: ufs: Remove unused device quirks

The UFSHC driver defines a few quirks that are not used anywhere:

UFS_DEVICE_QUIRK_BROKEN_LCC
UFS_DEVICE_NO_VCCQ
UFS_DEVICE_QUIRK_NO_LINK_OFF
UFS_DEVICE_NO_FASTAUTO

Let's remove them.

Acked-by: Avri Altman <[email protected]>
Acked-by: Alim Akhtar <[email protected]>
Reviewed-by: Evan Green <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

Revert "scsi: ufs: disable vccq if it's not needed by UFS device"

This reverts commit 60f0187031c05e04cbadffb62f557d0ff3564490.

There was one conflict in drivers/scsi/ufs/ufshcd.c

<<<<<<< HEAD
/* Init check for device descriptor sizes */
ufshcd_init_desc_sizes(hba);

ret = ufs_get_device_desc(hba, &card);
if (ret) {
dev_err(hba->dev, "%s: Failed getting device info. err = %d\n",
__func__, ret);
goto out;
}

ufs_fixup_device_setup(hba, &card);
ufshcd_tune_unipro_params(hba);

ret = ufshcd_set_vccq_rail_unused(hba,
(hba->dev_quirks & UFS_DEVICE_NO_VCCQ) ? true : false);
if (ret)
goto out;

=======
ufs_advertise_fixup_device(hba);
>>>>>>> parent of 60f0187031c0... scsi: ufs: disable vccq if it's not needed by UFS device

Resolution: keep HEAD, and delete the ufshcd_set_vccq_rail_unused() call
and corresponding error-handling code.

Clean up loose ends in a follow-up patch.

60f0187031c0 introduced a small power optimization: ignore the vccq load
specified in the UFSHC DT node when said host controller is connected to
specific Flash chips (currently, Samsung and Hynix).

Unfortunately, this optimization breaks UFS on systems where vccq powers
not only the Flash chip, but the host controller as well, such as APQ8098
MEDIABOX or MTP8998:

[    3.929877] ufshcd-qcom 1da4000.ufshc: ufshcd_query_attr: opcode 0x04 for idn 13 failed, index 0, err = -11
[    5.433815] ufshcd-qcom 1da4000.ufshc: ufshcd_query_attr: opcode 0x04 for idn 13 failed, index 0, err = -11
[    6.937771] ufshcd-qcom 1da4000.ufshc: ufshcd_query_attr: opcode 0x04 for idn 13 failed, index 0, err = -11
[    6.937866] ufshcd-qcom 1da4000.ufshc: ufshcd_query_attr_retry: query attribute, idn 13, failed with error -11 after 3 retires
[    6.946412] ufshcd-qcom 1da4000.ufshc: ufshcd_disable_auto_bkops: failed to enable exception event -11
[    6.957972] ufshcd-qcom 1da4000.ufshc: dme-peer-get: attr-id 0x1587 failed 3 retries
[    6.967181] ufshcd-qcom 1da4000.ufshc: dme-peer-get: attr-id 0x1586 failed 3 retries
[    6.975025] ufshcd-qcom 1da4000.ufshc: ufshcd_get_max_pwr_mode: invalid max pwm tx gear read = 0
[    6.982755] ufshcd-qcom 1da4000.ufshc: ufshcd_probe_hba: Failed getting max supported power mode
[    8.505770] ufshcd-qcom 1da4000.ufshc: ufshcd_query_flag: Sending flag query for idn 3 failed, err = -11
[   10.009807] ufshcd-qcom 1da4000.ufshc: ufshcd_query_flag: Sending flag query for idn 3 failed, err = -11
[   11.513766] ufshcd-qcom 1da4000.ufshc: ufshcd_query_flag: Sending flag query for idn 3 failed, err = -11
[   11.513861] ufshcd-qcom 1da4000.ufshc: ufshcd_query_flag_retry: query attribute, opcode 5, idn 3, failed with error -11 after 3 retires
[   13.049807] ufshcd-qcom 1da4000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 8 failed, index 0, err = -11
[   14.553768] ufshcd-qcom 1da4000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 8 failed, index 0, err = -11
[   16.057767] ufshcd-qcom 1da4000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 8 failed, index 0, err = -11
[   16.057872] ufshcd-qcom 1da4000.ufshc: ufshcd_read_desc_param: Failed reading descriptor. desc_id 8, desc_index 0, param_offset 0, ret -11
[   16.067109] ufshcd-qcom 1da4000.ufshc: ufshcd_init_icc_levels: Failed reading power descriptor.len = 98 ret = -11
[   37.073787] ufshcd-qcom 1da4000.ufshc: link startup failed 1

In my opinion, the rationale for the original patch is questionable.  If
neither the UFSHC, nor the Flash chip, require any load from vccq, then
that power rail should simply not be specified at all in the DT.

Working around that fact in the driver is detrimental, as evidenced by the
failure to initialize the host controller on MSM8998.

Acked-by: Avri Altman <[email protected]>
Acked-by: Alim Akhtar <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: megaraid_sas: Remove a bunch of set but not used variables

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'wait_and_poll':
drivers/scsi/megaraid/megaraid_sas_fusion.c:936:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]

drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_sync_map_info':
drivers/scsi/megaraid/megaraid_sas_fusion.c:1329:6: warning:
variable 'size_sync_info' set but not used [-Wunused-but-set-variable]

drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_init_adapter_fusion':
drivers/scsi/megaraid/megaraid_sas_fusion.c:1639:39: warning:
variable 'reg_set' set but not used [-Wunused-but-set-variable]

drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_is_prp_possible':
drivers/scsi/megaraid/megaraid_sas_fusion.c:1925:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]

drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_make_prp_nvme':
drivers/scsi/megaraid/megaraid_sas_fusion.c:2047:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]

drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_build_ldio_fusion':
drivers/scsi/megaraid/megaraid_sas_fusion.c:2620:42: warning:
variable 'req_desc' set but not used [-Wunused-but-set-variable]

drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_build_and_issue_cmd_fusion':
drivers/scsi/megaraid/megaraid_sas_fusion.c:3245:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]

drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_task_abort_fusion':
drivers/scsi/megaraid/megaraid_sas_fusion.c:4398:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]

drivers/scsi/megaraid/megaraid_sas_fusion.c: In function 'megasas_reset_target_fusion':
drivers/scsi/megaraid/megaraid_sas_fusion.c:4484:25: warning:
variable 'fusion' set but not used [-Wunused-but-set-variable]

They're not used anymore and can be removed.

Signed-off-by: YueHaibing <[email protected]>
Acked-by: Sumit Saxena <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: clean obsolete return values of eh_timed_out

Those are no longer in use since commit 242f9dcb8ba6
("block: unify request timeout handling").

Signed-off-by: Avri Altman <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: sd: Optimal I/O size should be a multiple of physical block size

It was reported that some devices report an OPTIMAL TRANSFER LENGTH of
0xFFFF blocks. That looks bogus, especially for a device with a
4096-byte physical block size.

Ignore OPTIMAL TRANSFER LENGTH if it is not a multiple of the device's
reported physical block size.

To make the sanity checking conditionals more readable--and to
facilitate printing warnings--relocate the checking to a helper
function. No functional change aside from the printks.

Cc: <[email protected]>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199759
Reported-by: Christoph Anton Mitterer <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: MAINTAINERS: SCSI initiator and target tweaks

Nic has been absent for a while and target changes now go through the SCSI
tree. To avoid confusion wrt. the NVMe target, clarify that this entry
refers to the SCSI target subsystem.

Also add patchwork links for both SCSI initiator and target.

Signed-off-by: Martin K. Petersen <[email protected]>

scsi: fcoe: make use of fip_mode enum complete

commit 1917d42d14b7 ("fcoe: use enum for fip_mode") introduces a separate
enum for the fip_mode that shall be used during initialisation handling
until it is passed to fcoe_ctrl_link_up to set the initial fip_state. That
change was incomplete and gcc quietly converted in various places between
the fip_mode and the fip_state enum values with implicit enum conversions,
which fortunately cannot cause any issues in the actual code's execution.

clang however warns about these implicit enum conversions in the scsi
drivers. This commit consolidates the use of the two enums, guided by
clang's enum-conversion warnings.

This commit now completes the use of the fip_mode: It expects and uses
fip_mode in {bnx2fc,fcoe}_interface_create and fcoe_ctlr_init, and it calls
fcoe_ctrl_set_set() with the correct values in fcoe_ctlr_link_up(). It
also breaks the association between FIP_MODE_AUTO and FIP_ST_AUTO to
indicate these two enums are distinct.

Link: https://github.com/ClangBuiltLinux/linux/issues/151
Fixes: 1917d42d14b7 ("fcoe: use enum for fip_mode")
Reported-by: Dmitry Golovin <[email protected]>
Original-by: Lukas Bulwahn <[email protected]>
CC: Lukas Bulwahn <[email protected]>
CC: Nick Desaulniers <[email protected]>
CC: Nathan Chancellor <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Suggested-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Sedat Dilek <[email protected]>
Signed-off-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: megaraid_sas: return error when create DMA pool failed

when create DMA pool for cmd frames failed, we should return -ENOMEM,
instead of 0.
In some case in:

    megasas_init_adapter_fusion()

    -->megasas_alloc_cmds()
       -->megasas_create_frame_pool
          create DMA pool failed,
        --> megasas_free_cmds() [1]

    -->megasas_alloc_cmds_fusion()
       failed, then goto fail_alloc_cmds.
    -->megasas_free_cmds() [2]

we will call megasas_free_cmds twice, [1] will kfree cmd_list,
[2] will use cmd_list.it will cause a problem:

Unable to handle kernel NULL pointer dereference at virtual address
00000000
pgd = ffffffc000f70000
[00000000] *pgd=0000001fbf893003, *pud=0000001fbf893003,
*pmd=0000001fbf894003, *pte=006000006d000707
Internal error: Oops: 96000005 [#1] SMP
Modules linked in:
CPU: 18 PID: 1 Comm: swapper/0 Not tainted
task: ffffffdfb9290000 ti: ffffffdfb923c000 task.ti: ffffffdfb923c000
PC is at megasas_free_cmds+0x30/0x70
LR is at megasas_free_cmds+0x24/0x70
...
Call trace:
[<ffffffc0005b779c>] megasas_free_cmds+0x30/0x70
[<ffffffc0005bca74>] megasas_init_adapter_fusion+0x2f4/0x4d8
[<ffffffc0005b926c>] megasas_init_fw+0x2dc/0x760
[<ffffffc0005b9ab0>] megasas_probe_one+0x3c0/0xcd8
[<ffffffc0004a5abc>] local_pci_probe+0x4c/0xb4
[<ffffffc0004a5c40>] pci_device_probe+0x11c/0x14c
[<ffffffc00053a5e4>] driver_probe_device+0x1ec/0x430
[<ffffffc00053a92c>] __driver_attach+0xa8/0xb0
[<ffffffc000538178>] bus_for_each_dev+0x74/0xc8
  [<ffffffc000539e88>] driver_attach+0x28/0x34
[<ffffffc000539a18>] bus_add_driver+0x16c/0x248
[<ffffffc00053b234>] driver_register+0x6c/0x138
[<ffffffc0004a5350>] __pci_register_driver+0x5c/0x6c
[<ffffffc000ce3868>] megasas_init+0xc0/0x1a8
[<ffffffc000082a58>] do_one_initcall+0xe8/0x1ec
[<ffffffc000ca7be8>] kernel_init_freeable+0x1c8/0x284
[<ffffffc0008d90b8>] kernel_init+0x1c/0xe4

Signed-off-by: Jason Yan <[email protected]>
Acked-by: Sumit Saxena <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Avoid PCI IRQ affinity mapping when multiqueue is not supported

This patch fixes warning seen when BLK-MQ is enabled and hardware does not
support MQ. This will result into driver requesting MSIx vectors which are
equal or less than pre_desc via PCI IRQ Affinity infrastructure.

    [   19.746300] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.00.00.12-k.
    [   19.746599] qla2xxx [0000:02:00.0]-001d: : Found an ISP2432 irq 18 iobase 0x(____ptrval____).
    [   20.203186] ------------[ cut here ]------------
    [   20.203306] WARNING: CPU: 8 PID: 268 at drivers/pci/msi.c:1273 pci_irq_get_affinity+0xf4/0x120
    [   20.203481] Modules linked in: tg3 ptp qla2xxx(+) pps_core sg libphy scsi_transport_fc flash loop autofs4
    [   20.203700] CPU: 8 PID: 268 Comm: systemd-udevd Not tainted 5.0.0-rc5-00358-gdf3865f #113
    [   20.203830] Call Trace:
    [   20.203933]  [0000000000461bb0] __warn+0xb0/0xe0
    [   20.204090]  [00000000006c8f34] pci_irq_get_affinity+0xf4/0x120
    [   20.204219]  [000000000068c764] blk_mq_pci_map_queues+0x24/0x120
    [   20.204396]  [00000000007162f4] scsi_map_queues+0x14/0x40
    [   20.204626]  [0000000000673654] blk_mq_update_queue_map+0x94/0xe0
    [   20.204698]  [0000000000676ce0] blk_mq_alloc_tag_set+0x120/0x300
    [   20.204869]  [000000000071077c] scsi_add_host_with_dma+0x7c/0x300
    [   20.205419]  [00000000100ead54] qla2x00_probe_one+0x19d4/0x2640 [qla2xxx]
    [   20.205621]  [00000000006b3c88] pci_device_probe+0xc8/0x160
    [   20.205697]  [0000000000701c0c] really_probe+0x1ac/0x2e0
    [   20.205770]  [0000000000701f90] driver_probe_device+0x50/0x100
    [   20.205843]  [0000000000702134] __driver_attach+0xf4/0x120
    [   20.205913]  [0000000000700644] bus_for_each_dev+0x44/0x80
    [   20.206081]  [0000000000700c98] bus_add_driver+0x198/0x220
    [   20.206300]  [0000000000702950] driver_register+0x70/0x120
    [   20.206582]  [0000000010248224] qla2x00_module_init+0x224/0x284 [qla2xxx]
    [   20.206857] ---[ end trace b1de7a3f79fab2c2 ]---

The fix is to check if the hardware does not have Multi Queue capabiltiy,
use pci_alloc_irq_vectors() call instead of pci_alloc_irq_affinity().

Fixes: f664a3cc17b7d ("scsi: kill off the legacy IO path")
Cc: [email protected] #4.19
Signed-off-by: Giridhar Malavali <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Update driver version to 10.00.00.14-k

Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Add new FW dump template entry types

This patch adds new firmware dump template entries for ISP27XX firmware
dump.

Signed-off-by: Joe Carnuccio <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Fix code indentation for qla27xx_fwdt_entry

This patch fixes following checkpatch ERROR

ERROR: space prohibited before that ',' (ctx:WxW)

No change is functionality due to this patch.

Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Move marker request behind QPair

Current code hard codes marker request to use request and response queue
0. This patch make use of the qpair as the path to access the
request/response queues. It allows marker to be place on any hardware
queue.

Signed-off-by: Quinn Tran <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Prevent SysFS access when chip is down

Prevent user from sending commands through sysfs while FW is not running or
reset is in progress.

Signed-off-by: Quinn Tran <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Add support for setting port speed

This patch adds sysfs node

1. There is a new sysfs node port_speed
2. The possible values are 2(Auto neg), 8, 16, 32
3. A value outside of the above defaults to Auto neg
4. Any update to the setting causes a link toggle
5. This feature is currently only for ISP27xx

Signed-off-by: Anil Gurumurthy <[email protected]>
Signed-off-by: Quinn Tran <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Prevent multiple ADISC commands per session

Add check to allow 1 discovery command per session to be sent.

Signed-off-by: Quinn Tran <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Check for FW started flag before aborting

For FC-NVMe, if the fw_started flag is not set or fcport is deleted, then
do not send Abort command

Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Fix unload when NVMe devices are configured

This patch fixes driver unload issue when FC-NVMe devices are configured.

Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Add First Burst support for FC-NVMe devices

Add Support for First Burst for FC-NVMe protocol. This feature requires
First Burst support in the firmware.

Signed-off-by: Darren Trapp <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Fix LUN discovery if loop id is not assigned yet by firmware

This patch fixes LUN discovery when loop ID is not yet assigned by the
firmware during driver load/sg_reset operations. Driver will now search for
new loop id before retrying login.

Fixes: 48acad099074 ("scsi: qla2xxx: Fix N2N link re-connect")
Cc: [email protected] #4.19
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: remove redundant null check on pointer sess

The null check on pointer sess and the subsequent call is redundant as sess
is null on all the the paths that lead to the out_term2 label. Hence the
null check and the call can be removed. Also remove the redundant setting
of sess to NULL as this is not required now.

Detected by CoverityScan, CID#1420663 ("Logically dead code")

Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: Move debug messages before sending srb preventing panic

When sending an srb with qla2x00_start_sp, the sp can complete and be freed
by the time we log the debug message saying we sent it. This can cause a
panic if sp gets reused quickly or when running a kernel that poisons freed
memory.

This was partially fixed by (not every case was addressed):

Commit 9fe278f44b4b ("scsi: qla2xxx: Move log messages before issuing
command to firmware")

Signed-off-by: Bill Kuzeja <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Remove set but not used variable 'phys_id'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/scsi/lpfc/lpfc_init.c: In function 'lpfc_cpu_affinity_check':
drivers/scsi/lpfc/lpfc_init.c:10599:19: warning:
variable 'phys_id' set but not used [-Wunused-but-set-variable]

It never used since introduction in commit 6a828b0f6192 ("scsi: lpfc:
Support non-uniform allocation of MSIX vectors to hardware queues")

Signed-off-by: YueHaibing <[email protected]>
Acked-by: James Smart <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: ufs: Add HI3670 SoC UFS driver support

Add HI3670 SoC UFS driver support by extending the common ufs-hisi
driver. One major difference between HI3660 ad HI3670 SoCs interms of UFS
is the PHY. HI3670 has a 10nm variant PHY and hence this parameter is used
to distinguish the configuration.

Signed-off-by: Manivannan Sadhasivam <[email protected]>
Acked-by: Wei Li <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: dt-bindings: ufs: Add HI3670 UFS controller binding

Add devicetree binding for HI3670 UFS controller. HI3760 SoC is very
similar to HI3660 SoC with almost same IPs. Only major difference in terms
of UFS is the PHY. HI3670 has 10nm PHY. But since the original driver
(HI3660 UFS) cannot make HI3670 UFS functional, a separate compatible
is added for HI3670 without any fallback.

Signed-off-by: Manivannan Sadhasivam <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Acked-by: Wei Li <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: fix a handful of indentation issues

There are a handful of statements that are indented incorrectly. Fix these.

Signed-off-by: Colin Ian King <[email protected]>
Acked-by: James Smart <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qlogicpti: Use of_node_name_eq for node name comparisons

Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.

As prom_name is not used for anything else, remove it.

Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: [email protected]
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: sd: Fix typo in sd_first_printk()

Commit b2bff6ceb61a9 ("[SCSI] sd: Quiesce mode sense error messages")
added the macro sd_first_printk(). The macro takes "sdsk" as argument
but dereferences "sdkp". This hasn't caused any real issues since all
callers of sd_first_printk() have an sdkp. But fix the typo.

[mkp: Turned this into a real patch and tweaked commit description]

Signed-off-by: Dietmar Hahn <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: megaraid_sas: driver version update

Signed-off-by: Shivasharan S <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: megaraid_sas: Update structures for HOST_DEVICE_LIST DCMD

Add padding to make the structure variables in MR_HOST_DEVICE_LIST_ENTRY
64-bit aligned. Also, add reserved fields to MR_HOST_DEVICE_LIST for
future firmware usage.

Signed-off-by: Shivasharan S <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Fix error code if kcalloc() fails

This should return -ENOMEM if kcalloc() fails, but it accidentally
returns success instead.

Fixes: 6a828b0f6192 ("scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Ewan D. Milne <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: ufs: fix a typo in comment

poitner -> pointer.

Signed-off-by: Chengguang Xu <[email protected]>
Reviewed-by: Pedro Sousa <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: scsi_debug: Implement support for write protect

Teach scsi_debug to honor SWP in the Control Mode Page and report the
resulting WP state in the Device-Specific Parameter field.

In check_device_access_params() verify that commands that will write
the medium are permitted to do so.

Signed-off-by: Martin K. Petersen <[email protected]>
Acked-by: Douglas Gilbert <[email protected]>

scsi: core: Move resid from scsi_data_buffer to scsi_cmnd

This patch does not change any functionality but reduces the size of
struct scsi_cmnd.

Cc: Douglas Gilbert <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: sd: Remove superfluous residual assignments

Since commit 26e85fcd15f6 ("[SCSI] sd: Permit merged discard requests";
kernel v3.10) sd_done() sets the residual not only for failed special
requests but also for special requests that succeeded. Hence remove the
code from functions called by sd_init_command() that sets the residual.
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: uas: Use scsi_[gs]et_resid() where appropriate

This patch does not change any functionality.

Cc: Oliver Neukum <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Acked-by: Oliver Neukum <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: scsi_debug: Use scsi_[gs]et_resid() where appropriate

This patch does not change any functionality.

Cc: Douglas Gilbert <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Acked-by: Douglas Gilbert <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: libiscsi: Use scsi_[gs]et_resid() where appropriate

This patch does not change any functionality.

Cc: Lee Duncan <[email protected]>
Cc: Chris Leech <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: scsi_debug: Fix a recently introduced regression

A recent commit removed an element from opcode_info_arr[] but did not
modify opcode_ind_arr[] nor was SDEB_I_XDWRITEREAD removed. Remove
SDEB_I_XDWRITEREAD and bring the two arrays again in sync. This patch
avoids that the following is reported:

BUG: KASAN: null-ptr-deref in scsi_debug_queuecommand+0x60f/0xc90 [scsi_debug]
Read of size 1 at addr 0000000000000001 by task iscsi-test-cu/683
CPU: 3 PID: 683 Comm: iscsi-test-cu Not tainted 5.0.0-rc5-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
dump_stack+0x86/0xca
kasan_report.cold.3+0x5/0x3e
__asan_load1+0x47/0x50
scsi_debug_queuecommand+0x60f/0xc90 [scsi_debug]
scsi_queue_rq+0xc17/0x12e0
blk_mq_dispatch_rq_list+0x5fc/0xb10
blk_mq_sched_dispatch_requests+0x2f7/0x300
__blk_mq_run_hw_queue+0xd6/0x180
__blk_mq_delay_run_hw_queue+0x25c/0x290
blk_mq_run_hw_queue+0x119/0x1b0
blk_mq_sched_insert_request+0x274/0x350
blk_execute_rq_nowait+0x78/0x90
blk_execute_rq+0xcc/0x140
sg_io+0x30f/0x700
scsi_cmd_ioctl+0x4d4/0x540
scsi_cmd_blk_ioctl+0x7b/0x8b
sd_ioctl+0xba/0x150
blkdev_ioctl+0x6e1/0xea0
block_ioctl+0x79/0x90
do_vfs_ioctl+0x12b/0x9b0
ksys_ioctl+0x41/0x80
__x64_sys_ioctl+0x43/0x50
do_syscall_64+0x71/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Cc: Christoph Hellwig <[email protected]>
Cc: Douglas Gilbert <[email protected]>
Fixes: ae3d56d81507 ("scsi: remove bidirectional command support")
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hisi_sas: Do some more tidy-up

Do some very minor tidy-up, for things like needlessly initing variable and
not leaving whitespace before quote endings.

Originally-from: Xiang Chen <[email protected]>
Originally-from: Luo Jiaxing <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental

For auto-control irq affinity mode, choose the dq to deliver IO according
to the current CPU.

Then it decreases the performance regression that fio and CQ interrupts are
processed on different node.

For user control irq affinity mode, keep it as before.

To realize it, also need to distinguish the usage of dq lock and sas_dev
lock.

We mark as experimental due to ongoing discussion on managed MSI IRQ
during hotplug:
https://marc.info/?l=linux-scsi&m=154876335707751&w=2

We're almost at the point where we can expose multiple queues to the upper
layer for SCSI MQ, but we need to sort out the per-HBA tags performance
issue.

Signed-off-by: Xiang Chen <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hisi_sas: Issue internal abort on all relevant queues

To support queue mapped to a CPU, it needs to be ensured that issuing an
internal abort is safe, in that it is guaranteed that an internal abort is
processed for a single IO or a device after all the relevant command(s)
which it is attempting to abort have been processed by the controller.

Currently we only deliver commands for any device on a single queue to
solve this problem, as we know that commands issued on the same queue will
be processed in order, and we will not have a scenario where the internal
abort is racing against a command(s) which it is trying to abort.

To enqueue commands on queue mapped to a CPU, choosing a queue for an
command is based on the associated queue for the current CPU, so this is
not safe for internal abort since it would definitely not be guaranteed
that commands for the command devices are issued on the same queue.

To solve this issue, we take a bludgeoning approach, and issue a separate
internal abort on any queue(s) relevant to the command or device, in that
we will be guaranteed that at least one of these internal aborts will be
received last in the controller.

So, for aborting a single command, we can just force the internal abort to
be issued on the same queue as the command which we are trying to abort.

For aborting all commands associated with a device, we issue a separate
internal abort on all relevant queues. Issuing multiple internal aborts in
this fashion would have not side affect.

Signed-off-by: John Garry <[email protected]>
Signed-off-by: Xiang Chen <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hisi_sas: change queue depth from 512 to 4096

If sending IOs to many disks from single queue, it is possible that the
queue may be full. To avoid the situation, change queue depth from 512 to
4096 which is the max number of IOs for v3 hw.

Signed-off-by: Xiang Chen <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hisi_sas: Add manual trigger for debugfs dump

Add an interface to manually trigger a debugfs dump.

Signed-off-by: Luo Jiaxing <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hisi_sas: Add support for DIX feature for v3 hw

This patch adds support for DIX to v3 hw driver.

For this, we build upon support for DIF, most significantly is adding new
DMA map and unmap paths.

Some pre-existing macro precedence issues are also tidied. They were
detected by checkpatch --strict.

Signed-off-by: Xiang Chen <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: dt-bindings: ufs: Fix the compatible string definition

If you look at the bindings for the UFS Host Controller it says:

- compatible: must contain "jedec,ufs-1.1" or "jedec,ufs-2.0", may
              also list one or more of the following:
                 "qcom,msm8994-ufshc"
                 "qcom,msm8996-ufshc"
                 "qcom,ufshc"

My reading of that is that it's fine to just have either of these:
1. "qcom,msm8996-ufshc", "jedec,ufs-2.0"
2. "qcom,ufshc", "jedec,ufs-2.0"

As far as I can tell neither of the above is actually a good idea.

For #1 it turns out that the driver currently only keys off the
compatible string "qcom,ufshc" so it won't actually probe.

For #2 the driver won't probe but it's not a good idea to keep the SoC
name out of the compatible string.

Let's update the compatible string to make it really explicit.  We'll
include a nod to the existing driver and the old binding and say that
we should always include the "qcom,ufshc" string in addition to the
SoC compatible string.

While we're at it we'll also include another example SoC known to have
UFS: sdm845.

Fixes: 47555a5c8a11 ("scsi: ufs: make the UFS variant a platform device")
Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Vivek Gautam <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: ata: Use unsigned int for cmd's type in ioctls in scsi_host_template

Clang warns several times in the scsi subsystem (trimmed for brevity):

drivers/scsi/hpsa.c:6209:7: warning: overflow converting case value to
switch condition type (2147762695 to 18446744071562347015) [-Wswitch]
        case CCISS_GETBUSTYPES:
             ^
drivers/scsi/hpsa.c:6208:7: warning: overflow converting case value to
switch condition type (2147762694 to 18446744071562347014) [-Wswitch]
        case CCISS_GETHEARTBEAT:
             ^

The root cause is that the _IOC macro can generate really large numbers,
which don't fit into type 'int', which is used for the cmd parameter in
the ioctls in scsi_host_template. My research into how GCC and Clang are
handling this at a low level didn't prove fruitful. However, looking at
the rest of the kernel tree, all ioctls use an 'unsigned int' for the
cmd parameter, which will fit all of the _IOC values in the scsi/ata
subsystems.

Make that change because none of the ioctls expect a negative value for
any command, it brings the ioctls inline with the reset of the kernel,
and it removes ambiguity, which is never good when dealing with compilers.

Link: https://github.com/ClangBuiltLinux/linux/issues/85
Link: https://github.com/ClangBuiltLinux/linux/issues/154
Link: https://github.com/ClangBuiltLinux/linux/issues/157
Signed-off-by: Nathan Chancellor <[email protected]>
Acked-by: Bradley Grove <[email protected]>
Acked-by: Don Brace <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Update lpfc version to 12.2.0.0

Update lpfc version to 12.2.0.0

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Update 12.2.0.0 file copyrights to 2019

For files modified as part of 12.2.0.0 patches, update copyright to 2019

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Fix nvmet issues when link bounce under IO load

Various null pointer dereference and general protection fault panics occur
when there is a link bounce under load. There are a large number of "error"
message 6413 indicating "bad release".

The issues resolve to list corruptions due to missing or inconsistent lock
protection. Lockups are due to nested locks in the unsolicited abort
path. The unsolicited abort path calls the wrong abort processing
routine. There was also duplicate context release while aborts were still
active in the hardware.

Removed duplicate locks and added lock protection around list item
removal. Commonized lock handling around the abort processing routines.
Prevent context release while still in ABTS list.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Correct upcalling nvmet_fc transport during io done downcall

When the transport calls into the lpfc target to release an IO job
structure, which corresponds to an exchange, and if the driver was waiting
for an exchange in order to post a previously received command to the
transport, the driver immediately takes the IO job and reuses the context
for the prior command and calls nvmet_fc_rcv_fcp_req() to tell the
transport about a newly received command.

Problem is, the execution of the IO job release may be in the context of
the back end driver and its bio completion handlers, thus it may be in a
irq context and protection code kicks in in the bio and request layers that
are subsequently called.

Rework lpfc so that instead of immediately upcalling, queue it to a
deferred work thread and have the thread make the upcall.

Took advantage of this change to remove duplicated code with the normal
command receive path that preps the IO job and upcalls nvmet_fc. Created a
common routine both paths use.

Also corrected some errors that were found during review of the context
freeing and reuse - basically unlocked operations and a somewhat disjoint
set of calls to release associated job elements. Cleaned up this path and
added locks for coherency.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Fix default driver parameter collision for allowing NPIV support

The conversion to enable SCSI and NVME fc4 support ran into an issue with
NPIV support. With NVME, NPIV is not currently supported, but with SCSI it
was. The driver reverted to its lowest setting meaning NPIV with SCSI was
not allowed.

Convert the NPIV checks and implementation so that SCSI can continue to
allow NPIV support.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Rework locking on SCSI io completion

A scsi host lock is taken on every io completion to check whether the abort
handler is waiting on the io completion. This is an expensive lock to take
on all completion when rarely in an abort condition.

Replace scsi host lock with command-specific lock. Synchronize completion
and abort paths by new cmd lock. Ensure all flag changing and nulling of
context pointers taken under lock. When adding lock to task management
abort, realized it was missing other synchronization locks. Added that
synchronization to match normal paths.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Enable SCSI and NVME fc4s by default

Now that performance mods don't split resources by protocol and enable both
protocols by default, there's no reason not to enable concurrent SCSI and
NVME fc4 support.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Resize cpu maps structures based on possible cpus

The work done to date utilized the number of present cpus when sizing
per-cpu structures. Structures should have been sized based on the max
possible cpu count.

Convert the driver over to possible cpu count for sizing allocation.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Utilize new IRQ API when allocating MSI-X vectors

Current driver uses the older IRQ API for MSIX allocation

Change driver to utilize pci_alloc_irq_vectors when allocating IRQ vectors.

Make lpfc_cpu_affinity_check use pci_irq_get_affinity to determine how the
kernel mapped all the IRQs.

Remove msix_entries from SLI4 structure, replaced with pci_irq_vector()
usage.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Rework EQ/CQ processing to address interrupt coalescing

When driving high iop counts, auto_imax coalescing kicks in and drives the
performance to extremely small iops levels.

There are two issues:

1) auto_imax is enabled by default. The auto algorithm, when iops gets
    high, divides the iops by the hdwq count and uses that value to
    calculate EQ_Delay. The EQ_Delay is set uniformly on all EQs whether
    they have load or not. The EQ_delay is only manipulated every 5s (a
    long time). Thus there were large 5s swings of no interrupt delay
    followed by large/maximum delay, before repeating.

2) When processing a CQ, the driver got mixed up on the rate of when
    to ring the doorbell to keep the chip appraised of the eqe or cqe
    consumption as well as how how long to sit in the thread and
    process queue entries. Currently, the driver capped its work at
    64 entries (very small) and exited/rearmed the CQ.  Thus, on heavy
    loads, additional overheads were taken to exit and re-enter the
    interrupt handler. Worse, if in the large/maximum coalescing
    windows,k it could be a while before getting back to servicing.

The issues are corrected by the following:

- A change in defaults. Auto_imax is turned OFF and fcp_imax is set
   to 0. Thus all interrupts are immediate.

- Cleanup of field names and their meanings. Existing names were
   non-intuitive or used for duplicate things.

- Added max_proc_limit field, to control the length of time the
   handlers would service completions.

- Reworked EQ handling:
    Added common routine that walks eq, applying notify interval and max
      processing limits. Use queue_claimed to claim ownership of the queue
      while processing. Always rearm the queue whenever the common routine
      is called.
    Rework queue element processing, namely to eliminate hba_index vs
      host_index. Only one index is necessary. The queue entry can be
      marked invalid and the host_index updated immediately after eqe
      processing.
    After rework, xx_release routines are now DB write functions. Renamed
      the routines as such.
    Moved lpfc_sli4_eq_flush(), which does similar action, to same area.
    Replaced the 2 individual loops that walk an eq with a call to the
      common routine.
    Slightly revised lpfc_sli4_hba_handle_eqe() calling syntax.
    Added per-cpu counters to detect interrupt rates and scale
      interrupt coalescing values.

- Reworked CQ handling:
    Added common routine that walks cq, applying notify interval and max
      processing limits. Use queue_claimed to claim ownership of the queue
      while processing. Always rearm the queue whenever the common routine
      is called.
    Rework queue element processing, namely to eliminate hba_index vs
      host_index. Only one index is necessary. The queue entry can be
      marked invalid and the host_index updated immediately after cqe
      processing.
    After rework, xx_release routines are now DB write functions.  Renamed
      the routines as such.
    Replaced the 3 individual loops that walk a cq with a call to the
      common routine.
    Redefined lpfc_sli4_sp_handle_mcqe() to commong handler definition with
      queue reference. Add increment for mbox completion to handler.

- Added a new module/sysfs attribute: lpfc_cq_max_proc_limit To allow
   dynamic changing of the CQ max_proc_limit value being used.

Although this leaves an EQ as an immediate interrupt, that interrupt will
only occur if a CQ bound to it is in an armed state and has cqe's to
process.  By staying in the cq processing routine longer, high loads will
avoid generating more interrupts as they will only rearm as the processing
thread exits. The immediately interrupt is also beneficial to idle or
lower-processing CQ's as they get serviced immediately without being
penalized by sharing an EQ with a more loaded CQ.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: cleanup: convert eq_delay to usdelay

Review of the eq coalescing logic showed the code was a bit fragmented.
Sometimes it would save/set via an interrupt max value, while in others it
would do so via a usdelay. There were also two places changing eq delay,
one place that issued mailbox commands, and another that changed via
register writes if supported.

Clean this up by:

- Standardizing the operation of lpfc_modify_hba_eq_delay() routine so
   that it is always told of a us delay to impose. The routine then chooses
   the best way to set that - via register or via mbx.

- Rather than two value types stored in eq->q_mode (usdelay if change via
   register, imax if change via mbox) - q_mode always contains usdelay.
   Before any value change, old vs new value is compared and only if
   different is a change done.

- Revised the dmult calculation. dmult is not set based on overall imax
   divided by hardware queues - instead imax applies to a single cpu and
   the value will be replicated to all cpus.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues

So far MSIX vector allocation assumed it would be 1:1 with hardware
queues. However, there are several reasons why fewer MSIX vectors may be
allocated than hardware queues such as the platform being out of vectors or
adapter limits being less than cpu count.

This patch reworks the MSIX/EQ relationships with the per-cpu hardware
queues so they can function independently. MSIX vectors will be equitably
split been cpu sockets/cores and then the per-cpu hardware queues will be
mapped to the vectors most efficient for them.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Fix setting affinity hints to correlate with hardware queues

The desired affinity for the hardware queue behavior is for hdwq 0 to be
affinitized with cpu 0, hdwq 1 to cpu 1, and so on. The implementation so
far does not do this if the number of cpus is greater than the number of
hardware queues (e.g. hardware queue allocation was administratively
reduced or hardware queue resources could not scale to the cpu count).

Correct the queue affinitization logic when queue count is less than
cpu count.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Allow override of hardware queue selection policies

Default behavior is to use the information from the upper IO stacks to
select the hardware queue to use for IO submission. Which typically has
good cpu affinity.

However, the driver, when used on some variants of the upstream kernel, has
found queuing information to be suboptimal for FCP or IO completion locked
on particular cpus.

For command submission situations, the lpfc_fcp_io_sched module parameter
can be set to specify a hardware queue selection policy that overrides the
os stack information.

For IO completion situations, rather than queing cq processing based on the
cpu servicing the interrupting event, schedule the cq processing on the cpu
associated with the hardware queue's cq.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Adapt partitioned XRI lists to efficient sharing

The XRI get/put lists were partitioned per hardware queue. However, the
adapter rarely had sufficient resources to give a large number of resources
per queue. As such, it became common for a cpu to encounter a lack of XRI
resource and request the upper io stack to retry after returning a BUSY
condition. This occurred even though other cpus were idle and not using
their resources.

Create as efficient a scheme as possible to move resources to the cpus that
need them. Each cpu maintains a small private pool which it allocates from
for io. There is a watermark that the cpu attempts to keep in the private
pool. The private pool, when empty, pulls from a global pool from the
cpu. When the cpu's global pool is empty it will pull from other cpu's
global pool. As there many cpu global pools (1 per cpu or hardware queue
count) and as each cpu selects what cpu to pull from at different rates and
at different times, it creates a radomizing effect that minimizes the
number of cpu's that will contend with each other when the steal XRI's from
another cpu's global pool.

On io completion, a cpu will push the XRI back on to its private pool. A
watermark level is maintained for the private pool such that when it is
exceeded it will move XRI's to the CPU global pool so that other cpu's may
allocate them.

On NVME, as heartbeat commands are critical to get placed on the wire, a
single expedite pool is maintained. When a heartbeat is to be sent, it will
allocate an XRI from the expedite pool rather than the normal cpu
private/global pools. On any io completion, if a reduction in the expedite
pools is seen, it will be replenished before the XRI is placed on the cpu
private pool.

Statistics are added to aid understanding the XRI levels on each cpu and
their behaviors.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Synchronize hardware queues with SCSI MQ interface

Now that the lower half has much better per-cpu parallelization using the
hardware queues, the SCSI MQ support needs to be tied into it.

The involves the following mods:

- Use the hardware queue info from the midlayer to help select the
hardware queue to utilize. This required change to the get_scsi-buf_xxx
routines.

- Remove lpfc_sli4_scmd_to_wqidx_distr() routine. No longer needed.

- Includes fix for SLI-3 that does not have multi queue parallelization.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Convert ring number to hardware queue for nvme wqe posting.

SLI4 nvme functions are passing the SLI3 ring number when posting wqe to
hardware. This should be indicating the hardware queue to use, not the ring
number.

Replace ring number with the hardware queue that should be used.

Note: SCSI avoided this issue as it utilized an older lfpc_issue_iocb
routine that properly adapts.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Move SCSI and NVME Stats to hardware queue structures

Many io statistics were being sampled and saved using adapter-based data
structures. This was creating a lot of contention and cache thrashing in
the I/O path.

Move the statistics to the hardware queue data structures. Given the
per-queue data structures, use of atomic types is lessened.

Add new sysfs and debugfs stat routines to collate the per hardware queue
values and report at an adapter level.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Adapt cpucheck debugfs logic to Hardware Queues

Similar to the io execution path that reports cpu context information, the
debugfs routines for cpu information needs to be aligned with new hardware
queue implementation.

Convert debugfs cnd nvme cpucheck statistics to report information per
Hardware Queue.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: cleanup: Remove unused FCP_XRI_ABORT_EVENT slowpath event

Both NVME and SCSI aborts are now processed off the CQ workqueue and do not
generate events for the slowpath any more.

Remove the unused event code.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Partition XRI buffer list across Hardware Queues

Once the IO buff allocations were made shared, there was a single XRI
buffer list shared by all hardware queues. A single list isn't great for
performance when shared across the per-cpu hardware queues.

Create a separate XRI IO buffer get/put list for each Hardware Queue. As
SGLs and associated IO buffers get allocated/posted to the firmware; round
robin their assignment across all available hardware Queues so that there
is an equitable assignment.

Modify SCSI and NVME IO submit code paths to use the Hardware Queue logic
for XRI allocation.

Add a debugfs interface to display hardware queue statistics

Added new empty_io_bufs counter to track if a cpu runs out of XRIs.

Replace common_ variables/names with io_ to make meanings clearer.

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu

Currently, both nvme and fcp each have their own concept of an io_channel,
which is a combination wq/cq and associated msix.  Different cpus would
share an io_channel.

The driver is now moving to per-cpu wq/cq pairs and msix vectors.  The
driver will still use separate wq/cq pairs per protocol on each cpu, but
the protocols will share the msix vector.

Given the elimination of the nvme and fcp io channels, the module
parameters will be removed.  A new parameter, lpfc_hdw_queue is added which
allows the wq/cq pair allocation per cpu to be overridden and allocated to
lesser value. If lpfc_hdw_queue is zero, the number of pairs allocated will
be based on the number of cpus. If non-zero, the parameter specifies the
number of queues to allocate. At this time, the maximum non-zero value is
64.

To manage this new paradigm, a new hardware queue structure is created to
track queue activity and relationships.

As MSIX vector allocation must be known before setting up the
relationships, msix allocation now occurs before queue datastructures are
allocated. If the number of vectors allocated is less than the desired
hardware queues, the hardware queue counts will be reduced to the number of
vectors

Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>