Pluggalbe Storage Architecture (PSA)
vSphere ESXi uses a special VMkernel layer, Pluggable Storage Architecture (PSA) to manage storage multipathing. This open modular framework coordinates the simultaneous operation of multiple multipathing modules (MPPs).
PSA is a collection of VMkernel API that allow storage vendors to insert code into ESXi storage I/O path. Developers can design their own load balancing techniques and failover mechanisms for particular storage array.
The PSA coordinates the operation of the NMP and any additional 3rd party MPP.
Native Multipathing Plugin (NMP)
If more multipathing functionality is required, a third party can also provide an MPP to run in addition to, or as a replacement for, the default NMP.
- Manages physical path claiming and unclaiming.
- Registers and de-registers logical devices.
- Associates physical paths with logical devices.
- Processes I/O requests to logical devices:
- Selects an optimal physical path for the request (load balance)
- Performs actions necessary to handle failures and request retries.
- Supports management tasks such as abort or reset of logical devices.
Permanent Device Loss (PDL)
PDL is a storage state indicating the permanent removal of a device.
– datastore is shown as unavailable in the Storage view
– storage adapter indicates the Operational State of the device is Lost Communication
– all paths to the device are marked as Dead
This state typically occures when a LUN at a storage array is unmapped or deleted. The VMkernel core storage stack knows the device is not coming back because the storage array informs the host of a PDL state through a SCSI command response. When all paths have PDL errors the removal is considered permanent.
PDL can occure two ways:
1. Planned PDL : storage device removal workflow (LUN unmounting/detaching from multiple hosts)
2. Unplanned PDL: storage device is removed at the storage array level without informing ESXi
The device may return from PDL, but there are no guarantee of data consistency.
In case LUN is removed without ESXi prepared there is a potential to enter an All Paths Down (APD) state.
If PDL SCSI sense codes are not returned from a device (when unable to contact the storage array, or with a storage array that does not return the supported PDL SCSI codes), then the device is in an All-Paths-Down (APD) state, and the ESXi host continues to send I/O requests until the host receives a response.
As the ESXi host is not able to determine if the device loss is permanent (PDL) or transient (APD), it indefinitely retries SCSI I/O, including:
- Userworld I/O (
- Virtual machine guest I/O
Note: If an I/O request is issued from a guest, the operating system should timeout and abort the I/O.
Due to the nature of an APD situation, there is no clean way to recover.
- The APD situation needs to be resolved at the storage array/fabric layer to restore connectivity to the host.
- All affected ESXi hosts may require a reboot to remove any residual references to the affected devices that are in an APD state.
Note: Performing a vMotion migration of unaffected virtual machines is not possible, as the management agents may be affected by the APD condition, and the ESXi host may become unmanaged. As a result, a reboot of an affected ESXi host forces an outage to all non-affected virtual machines on that host.