Kashyap Desai [Thu, 20 May 2021 15:25:26 +0000 (20:55 +0530)]
scsi: mpi3mr: Add support for internal watchdog thread
The watchdog thread is the driver's internal thread which does a few things
such as detecting firmware faults, resetting the controller, performing
timestamp sync, etc.
Kashyap Desai [Thu, 20 May 2021 15:25:25 +0000 (20:55 +0530)]
scsi: mpi3mr: Add support for queue command processing
Send Port Enable Request to FW for Device Discovery. As part of port
enable completion driver calls scan_start and scan_finished hooks. SCSI
layer references like sdev, starget, etc. are added but actual device
discovery will be supported once driver adds complete event process
handling.
Kashyap Desai [Thu, 20 May 2021 15:25:24 +0000 (20:55 +0530)]
scsi: mpi3mr: Create operational request and reply queue pair
Create operational request and reply queue pair.
The MPI3 transport interface consists of an Administrative Request Queue,
an Administrative Reply Queue, and Operational Messaging Queues. The
Operational Messaging Queues are the primary communication mechanism
between the host and the I/O Controller (IOC). Request messages, allocated
in host memory, identify I/O operations to be performed by the IOC. These
operations are queued on an Operational Request Queue by the host driver.
Reply descriptors track I/O operations as they complete. The IOC queues
these completions in an Operational Reply Queue.
To fulfil large contiguous memory requirement, driver creates multiple
segments and provide the list of segments. Each segment size should be 4K
which is a hardware requirement. An element array is contiguous or
segmented. A contiguous element array is located in contiguous physical
memory. A contiguous element array must be aligned on an element size
boundary. An element's physical address within the array may be directly
calculated from the base address, the Producer/Consumer index, and the
element size.
Expected phased identifier bit is used to find out valid entry on reply
queue. Driver sets <ephase> bit and IOC inverts the value of this bit on
each pass.
Kees Cook [Fri, 28 May 2021 18:13:37 +0000 (11:13 -0700)]
scsi: isci: Use correctly sized target buffer for memcpy()
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), avoid intentionally writing across
neighboring array fields.
Switch from rsp_ui to resp_buf, since resp_ui isn't SSP_RESP_IU_MAX_SIZE
bytes in length. This avoids future compile-time warnings.
Kees Cook [Fri, 28 May 2021 18:13:36 +0000 (11:13 -0700)]
scsi: esas2r: Switch to flexible array member
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), avoid intentionally writing across
neighboring array fields.
Remove old-style 1-byte array in favor of a flexible array[1] to avoid
future false-positive cross-field memcpy() warning in:
Randy Dunlap [Sat, 29 May 2021 23:48:57 +0000 (16:48 -0700)]
scsi: FlashPoint: Rename si_flags field
The BusLogic driver has build errors on ia64 due to a name collision (in
the #included FlashPoint.c file). Rename the struct field in struct
sccb_mgr_info from si_flags to si_mflags (manager flags) to mend the build.
This is the first problem. There are 50+ others after this one:
In file included from ../include/uapi/linux/signal.h:6,
from ../include/linux/signal_types.h:10,
from ../include/linux/sched.h:29,
from ../include/linux/hardirq.h:9,
from ../include/linux/interrupt.h:11,
from ../drivers/scsi/BusLogic.c:27:
../arch/ia64/include/uapi/asm/siginfo.h:15:27: error: expected ':', ',', ';', '}' or '__attribute__' before '.' token
15 | #define si_flags _sifields._sigfault._flags
| ^
../drivers/scsi/FlashPoint.c:43:6: note: in expansion of macro 'si_flags'
43 | u16 si_flags;
| ^~~~~~~~
In file included from ../drivers/scsi/BusLogic.c:51:
../drivers/scsi/FlashPoint.c: In function 'FlashPoint_ProbeHostAdapter':
../drivers/scsi/FlashPoint.c:1076:11: error: 'struct sccb_mgr_info' has no member named '_sifields'
1076 | pCardInfo->si_flags = 0x0000;
| ^~
../drivers/scsi/FlashPoint.c:1079:12: error: 'struct sccb_mgr_info' has no member named '_sifields'
scsi: mpt3sas: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple
of warnings by explicitly adding break statements instead of just letting
the code fall through to the next case.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:46 +0000 (10:30 +0200)]
scsi: core: Drop obsolete Linux-specific SCSI status codes
Originally the SCSI subsystem has been using 'special' SCSI status codes,
which were the SAM-specified ones but shifted by 1. As most drivers have
now been modified to use the SAM-specified ones, having two nearly
identical sets of definitions only causes confusion.
The Linux-specifed SCSI status codes have been marked obsolete for several
years so drop them and use the SAM-specified status codes throughout.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:42 +0000 (10:30 +0200)]
scsi: fdomain: Translate message to host byte status
Instead of setting the message byte translate it to the appropriate host
byte. As error recovery would return DID_ERROR for any non-zero message
byte the translation doesn't change the error handling.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:38 +0000 (10:30 +0200)]
scsi: fas216: Translate message to host byte status
Instead of setting the message byte translate it to the appropriate host
byte. As error recovery would return DID_ERROR for any non-zero message
byte the translation doesn't change the error handling.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:37 +0000 (10:30 +0200)]
scsi: advansys: Do not set message byte in SCSI status
The host byte in the SCSI status takes precedence during error recovery, so
there is no point in setting the message byte in addition to a host byte
which is not DID_OK.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:36 +0000 (10:30 +0200)]
scsi: aha152x: Do not set message byte when calling scsi_done()
The done() function is called with a host_byte indicating the actual error
when the message byte is set. As the host byte takes precedence during
error recovery we can drop setting the message byte if the host byte is
set, too. The only other case is when the host byte is DID_OK, but in that
case the message byte is always COMMAND_COMPLETE (i.e. 0), so we can drop
it there, too.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:34 +0000 (10:30 +0200)]
scsi: acornscsi: Translate message byte to host byte
Instead of setting the message byte translate it to the appropriate host
byte. As error recovery would return DID_ERROR for any non-zero message
byte the translation doesn't change the error handling. And use SCSI
result accessors while we're at it.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:32 +0000 (10:30 +0200)]
scsi: mesh: Translate message to host byte status
Instead of setting the message byte translate it to a host byte status. As
the error recovery would map it to DID_ERROR anyway the translation doesn't
change the SCSI error handling.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:31 +0000 (10:30 +0200)]
scsi: wd33c93: Translate message byte to host byte
Instead of setting the message byte translate it to the appropriate host
byte. As error recovery would return DID_ERROR for any non-zero message
byte the translation doesn't change the error handling.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:25 +0000 (10:30 +0200)]
scsi: dc395: Translate message bytes
Drop message byte setting if the host byte is already set, and translate
message bytes into the related host bytes when evaluating an overrun or
underrun.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:19 +0000 (10:30 +0200)]
scsi: xen-scsifront: Compability status handling
The Xen guest might run against arbitrary backends, so the driver might
receive a status with driver_byte set. Map these errors to DID_ERROR to be
consistent with recent changes.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:18 +0000 (10:30 +0200)]
scsi: xen-scsiback: Use DID_ERROR instead of DRIVER_ERROR
DRIVER_ERROR was supposed to signal an error generated by the driver, which
xen-scsiback arguably isn't. Also the driver bytes don't have a detailed
error recovery, so we should rather return DID_ERROR instead of
DRIVER_ERROR.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:16 +0000 (10:30 +0200)]
scsi: core: Do not use DRIVER_INVALID
There is no point in returning DID_ABORT together with DRIVER_INVALID, as
the caller couldn't care less where the abort originated. So drop the use
of DRIVER_INVALID.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:15 +0000 (10:30 +0200)]
scsi: core: Kill DRIVER_SENSE
Replace the check for DRIVER_SENSE with a check for
scsi_status_is_check_condition().
Audit all callsites to ensure the SAM status is set correctly. For
backwards compability move the DRIVER_SENSE definition to sg.h, and update
sg, bsg, and scsi_ioctl to set the DRIVER_SENSE driver_status whenever
SAM_STAT_CHECK_CONDITION is present.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:12 +0000 (10:30 +0200)]
scsi: core: Stop using DRIVER_ERROR
Return the actual error code in __scsi_execute() (which, according to the
documentation, should have happened anyway). And audit all callers to cope
with negative return values from __scsi_execute() and friends.
Resetting interrupt aggregation counters first and reading the
DOOR_BELL afterward allows us to handle all the completed requests. In
order to prevent other interrupts starvation the DB is read once after
reset. The down side of this solution is the possibility of false
interrupt if device completes another request after resetting
aggregation and before reading the DB.
Prevent that ufshcd_intr() reports a false positive "Unhandled interrupt"
message if the above scenario is triggered.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:09 +0000 (10:30 +0200)]
scsi: core: Fixup calling convention for scsi_mode_sense()
The description for scsi_mode_sense() claims to return the number of valid
bytes on success, which is not what the code does. Additionally there is
no gain in returning the SCSI status, as everything the callers do is to
check against scsi_result_is_good(), which is what scsi_mode_sense() does
already. So change the calling convention to return a standard error code
on failure, and 0 on success, and adapt the description and all callers.
Suganath Prabu S [Tue, 18 May 2021 05:16:25 +0000 (10:46 +0530)]
scsi: mpt3sas: Handle firmware faults during second half of IOC init
If a firmware fault occurs while scanning the devices during IOC
initialization then the driver issues the hard reset operation to recover
the IOC. However, the driver is not issuing a Port enable request
message as part of hard reset operation during IOC initialization. Due to
this, the driver will not receive get any device discovery-related events
and hence devices will not be accessible.
Teach the driver to gracefully handle firmware faults while scanning for
target devices during IOC initialization. Make the driver issue a port
enable request message as part of hard reset operation. This permits
receiving device discovery-related events from the firmware after the hard
reset operation completes.
Hannes Reinecke [Tue, 27 Apr 2021 08:30:08 +0000 (10:30 +0200)]
scsi: scsi_ioctl: Return error code when blk_rq_map_kern() fails
The callers of sg_scsi_ioctl() already check for negative return values, so
we can drop the usage of DRIVER_ERROR and return the error from
blk_rq_map_kern() instead.
Suganath Prabu S [Tue, 18 May 2021 05:16:24 +0000 (10:46 +0530)]
scsi: mpt3sas: Handle firmware faults during first half of IOC init
During first half of IOC initialization (i.e. before going for device
scanning), if any firmware fault occurs then driver is aborting the IOC
initialization operation.
Modify the driver to issue a diag reset operation to recover IOC from fault
state and reinitialize the IOC.
Suganath Prabu S [Tue, 18 May 2021 05:16:23 +0000 (10:46 +0530)]
scsi: mpt3sas: Fix deadlock while cancelling the running firmware event
Do not cancel current running firmware event work if the event type is
different from MPT3SAS_REMOVE_UNRESPONDING_DEVICES. Otherwise a deadlock
can be observed while cancelling the current firmware event work if a hard
reset operation is called as part of processing the current event.
John Garry [Wed, 19 May 2021 14:31:02 +0000 (22:31 +0800)]
scsi: core: Cap scsi_host cmd_per_lun at can_queue
The sysfs handling function sdev_store_queue_depth() enforces that the sdev
queue depth cannot exceed shost can_queue. The initial sdev queue depth
comes from shost cmd_per_lun. However, the LLDD may manually set
cmd_per_lun to be larger than can_queue, which leads to an initial sdev
queue depth greater than can_queue.
Such an issue was reported in [0], which caused a hang. That has since been
fixed in commit fc09acb7de31 ("scsi: scsi_debug: Fix cmd_per_lun, set to
max_queue").
Stop this possibly happening for other drivers by capping shost cmd_per_lun
at shost can_queue.
James Smart [Fri, 14 May 2021 19:55:58 +0000 (12:55 -0700)]
scsi: lpfc: Reregister FPIN types if ELS_RDF is received from fabric controller
FC-LS-5 specifies that a received RDF implies a possible change to fabric
supported diagnostic functions. Endpoints are to re-perform the RDF
exchange with the fabric to enable possible new features or adapt to
changes in values.
This patch adds the logic to RDF receive to re-perform the RDF exchange
with the switch.
James Smart [Fri, 14 May 2021 19:55:57 +0000 (12:55 -0700)]
scsi: lpfc: Add a option to enable interlocked ABTS before job completion
Default behavior for the driver, when aborting an I/O, is to terminate the
I/O with the adapter. The adapter will initiate an ABTS to terminate the
exchange on the link and mark the exchange is terminated so that no further
use of the sgl or any traffic for the exchange is worked on. Completion on
the Abort is then posted to the driver, which as the I/O is terminated can
complete the I/O to the OS. This completion may occur prior to the ABTS
handshake completing on the wire. The ABTS handshake can take a long time
to complete with timeouts and retries reaching 60+ seconds. Note: if
retries fail, LOGO occurs.
Some devices want to ensure that the ABTS handshake fully completes (this
device has fully ack'd it) before the I/O completion is posted back to the
OS, where a failed I/O may be retried via a different path.
To support this behavior, an option was added to the driver to change I/O
completion from the Abort cmd completion to the Exchange termination (aka
ABTS) completion.
James Smart [Fri, 14 May 2021 19:55:55 +0000 (12:55 -0700)]
scsi: lpfc: Ignore GID-FT response that may be received after a link flip
When a link bounce happens, there is a possibility that responses to
requests posted prior to the link bounce could be received. This is
problematic as the counter to track reglogin completion after link up can
become out of sync with the real state.
As there is no reason to process a request made in a prior link up context,
eliminate all the disturbance by tagging the request with the event_tag
maintained by the SLI Port for the link. The event_tag will change on every
link state transition. As long as the tag matches the current event_tag,
the response can be processed. If it doesn't match, just discard the
response.
James Smart [Fri, 14 May 2021 19:55:54 +0000 (12:55 -0700)]
scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller
During link bounce testing, RPI counts were seen to differ from the number
of nodes. For fabric and domain controllers, a temporary RPI is assigned,
but the code isn't registering it. If the nodes do go away, such as on link
down, the temporary RPI isn't being released.
Change the way these two fabric services are managed, make them behave like
any other remote port. Register the RPI and register with the transport.
Never leave the nodes in a NPR or UNUSED state where their RPI is in limbo.
This allows them to follow normal dev_loss_tmo handling, RPI refcounting,
and normal removal rules. It also allows fabric I/Os to use the RPI for
traffic requests.
Note: There is some logic that still has a couple of exceptions when the
Domain controller (0xfffcXX). There are cases where the fabric won't have a
valid login but will send RDP. Other times, it will it send a LOGO then an
RDP. It makes for ad-hoc behavior to manage the node. Exceptions are
documented in the code.
James Smart [Fri, 14 May 2021 19:55:53 +0000 (12:55 -0700)]
scsi: lpfc: Fix Node recovery when driver is handling simultaneous PLOGIs
When lpfc is handling a solicited and unsolicited PLOGI with another
initiator, the remote initiator is never recovered. The node for the
initiator is erroneouosly removed and all resources released.
In lpfc_cmpl_els_plogi(), when lpfc_els_retry() returns a failure code, the
driver is calling the state machine with a device remove event because the
remote port is not currently registered with the SCSI or NVMe
transports. The issue is that on a PLOGI "collision" the driver correctly
aborts the solicited PLOGI and allows the unsolicited PLOGI to complete the
process, but this process is interrupted with a device_rm event.
Introduce logic in the PLOGI completion to capture the PLOGI collision
event and jump out of the routine. This will avoid removal of the node.
If there is no collision, the normal node removal will occur.
James Smart [Fri, 14 May 2021 19:55:52 +0000 (12:55 -0700)]
scsi: lpfc: Add ndlp kref accounting for resume RPI path
The driver is crashing due to a bad pointer during driver load due in an
adisc acc receive routine. The driver is missing node get/put in the
mbx_resume_rpi paths.
Fix by adding the proper gets and puts into the resume_rpi path.
James Smart [Fri, 14 May 2021 19:55:51 +0000 (12:55 -0700)]
scsi: lpfc: Fix "Unexpected timeout" error in direct attach topology
An 'unexpected timeout' message may be seen in a point-2-point topology.
The message occurs when a PLOGI is received before the driver is notified
of FLOGI completion. The FLOGI completion failure causes discovery to be
triggered for a second time. The discovery timer is restarted but no new
discovery activity is initiated, thus the timeout message eventually
appears.
In point-2-point, when discovery has progressed before the FLOGI completion
is processed, it is not a failure. Add code to FLOGI completion to detect
that discovery has progressed and exit the FLOGI handling (noop'ing it).
James Smart [Fri, 14 May 2021 19:55:50 +0000 (12:55 -0700)]
scsi: lpfc: Fix non-optimized ERSP handling
When processing an NVMe ERSP IU which didn't match the optimized CQE-only
path, the status was being left to the WQE status. WQE status is non-zero
as it is indicating a non-optimized completion that needs to be handled by
the driver.
Fix by clearing the status field when falling into the non-optimized
case. Log message added to track optimized vs non-optimized debug.
James Smart [Fri, 14 May 2021 19:55:49 +0000 (12:55 -0700)]
scsi: lpfc: Fix unreleased RPIs when NPIV ports are created
While testing NPIV and watching logins and used RPI levels, it was seen the
used RPI count was much higher than the number of remote ports discovered.
Code inspection showed that remote port removals on any NPIV instance are
releasing the RPI, but not performing an UNREG_RPI with the adapter thus
the reference counting never fully drops and the RPI is never fully
released. This was happening on NPIV nodes due to a log of fabric ELS's to
fabric addresses. This lack of UNREG_RPI was introduced by a prior node
rework patch that performed the UNREG_RPI as part of node cleanup.
To resolve the issue, do the following:
- Restore the RPI release code, but move the location to so that it is in
line with the new node cleanup design.
- NPIV ports now release the RPI and drop the node when the caller sets
the NLP_RELEASE_RPI flag.
- Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a
release of RPI to free pool.
- Ensure there's an UNREG_RPI at LOGO completion so that RPI release is
completed.
- Stop offline_prep from skipping nodes that are UNUSED. The RPI may
not have been released.
- Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4.
- Fixed up debugfs RPI displays for better debugging.
Martin Wilck [Fri, 14 May 2021 15:32:14 +0000 (17:32 +0200)]
scsi: scsi_dh_alua: Retry RTPG on a different path after failure
If an RTPG fails, we can't infer anything wrt. the state of the ports in
the port group except that we were unable to reach the one port on which
the RTPG had failed. "offline" is just a secondary port state, which means
that we can't infer the state of any port in the PG from the failure (in
fact, even the failed port might still be in "active/optimized" primary
port access state).
Therefore, when we encounter an RTPG failure, we should retry the RTPG on a
different port. This avoids falsely setting port states to offline for
unreachable ports. To do this, ports on which an RTPG has failed are
temporarily set to "disabled" to avoid repeating the failed I/O on the same
target port. Once the RTPG has either succeeded on one port or failed on
all ports of the PG, the ports are enabled again.
Bart Van Assche [Wed, 19 May 2021 20:20:58 +0000 (13:20 -0700)]
scsi: ufs: Use designated initializers in ufs_pm_lvl_states[]
The comments in the enum ufs_pm_level definition are redundant. Remove the
comments from the ufs_pm_level enum and use designated initializers in the
ufs_pm_lvl_states[] definition instead.
Sergey Shtylyov [Wed, 19 May 2021 19:20:15 +0000 (22:20 +0300)]
scsi: hisi_sas: Propagate errors in interrupt_init_v1_hw()
After commit 6c11dc060427 ("scsi: hisi_sas: Fix IRQ checks") we have the
error codes returned by platform_get_irq() ready for the propagation
upsream in interrupt_init_v1_hw() -- that will fix still broken deferred
probing. Let's propagate the error codes from devm_request_irq() as well
since I don't see the reason to override them with -ENOENT...
Daniel Wagner [Thu, 20 May 2021 07:31:27 +0000 (09:31 +0200)]
scsi: scsi_transport_fc: Remove double FC_FPORT_DELETED in mask creation
Remove the double listed FC_FPORT_DELETING from the mask creation.
Commit 260f4aeddb48 ("scsi: scsi_transport_fc: return -EBUSY for deleted
vport") added VC_VPORT_DELETING to the flag masks. This is not necessary as
FC_FPORT_DEL is defined as VC_FPORT_DELETED | FC_FPORT_DELETING.
Bart Van Assche [Sun, 9 May 2021 21:38:17 +0000 (14:38 -0700)]
scsi: ufs: ufs-exynos: Move definitions from .h to .c
In the Linux kernel definitions of data structures should occur in .c
files. Hence move the exynos7_uic_attr definition from a .h into a .c
file. Additionally, declare exynos_ufs_drvs static. This patch fixes the
following two sparse warnings:
drivers/scsi/ufs/ufs-exynos.h:248:28: warning: symbol 'exynos_ufs_drvs' was not declared. Should it be static?
drivers/scsi/ufs/ufs-exynos.h:250:28: warning: symbol 'exynos7_uic_attr' was not declared. Should it be static?
Samuel Holland [Tue, 27 Apr 2021 23:59:15 +0000 (18:59 -0500)]
scsi: 3w-9xxx: Fix endianness issues in command packets
The controller expects all data it sends/receives to be little-endian.
Therefore, the packet struct definitions should use the __le16/32/64
types. Once those are correct, sparse reports several issues with the
driver code, which are fixed here as well.
The main issue observed was at the call to scsi_set_resid(), where the
byteswapped parameter would eventually trigger the alignment check at
drivers/scsi/sd.c:2009. At that point, the kernel would continuously
complain about an "Unaligned partial completion", and no further I/O could
occur.
This gets the controller working on big endian powerpc64.
Samuel Holland [Tue, 27 Apr 2021 23:59:14 +0000 (18:59 -0500)]
scsi: 3w-9xxx: Reduce scope of structure packing
Currently, all command packet structs used by this driver are packed.
However, only one (TW_SG_Entry) actually needs to be packed, because it
uses 64-bit addresses at 32-bit alignment. To improve the quality of
generated code, stop packing all of the other command packet structs. This
requires adjusting the type of one misaligned "reserved" member.
After this change, pahole reports that only one type had its layout change:
the tw_compat_info member of TW_Device_Extension is now naturally aligned.
Samuel Holland [Tue, 27 Apr 2021 23:59:13 +0000 (18:59 -0500)]
scsi: 3w-9xxx: Use flexible array members to avoid struct padding
In preparation for removing the "#pragma pack(1)" from the driver, fix all
instances where a trailing array member could be replaced by a flexible
array member. Since a flexible array member has zero size, it introduces no
padding, whether or not the struct is packed.
Bart Van Assche [Thu, 13 May 2021 17:12:29 +0000 (10:12 -0700)]
scsi: ufs: core: Remove usfhcd_is_*_pm() macros
Remove these macros to make the UFS driver source code easier to read.
These macros were introduced by commit 57d104c153d3 ("ufs: add UFS power
management support").
Bump the SCSI primary command set standard to SPC-4. The upcoming version
descriptors will report newer SCSI standards (like SBC-3) that are not
defined in SPC-3.