PSA, NMP, PDL, APD

Pluggalbe Storage Architecture (PSA) 

vSphere ESXi uses a special VMkernel layer, Pluggable Storage Architecture (PSA) to manage storage multipathing. This open modular framework coordinates the simultaneous operation of multiple multipathing modules (MPPs).

PSA is a collection of VMkernel API that allow storage vendors to insert code into ESXi storage I/O path. Developers can design their own load balancing techniques and failover mechanisms for particular storage array.

The PSA coordinates the operation of the NMP and any additional 3rd party MPP.

Native Multipathing Plugin (NMP)

The VMkernel multipathing plugin that ESX/ESXi provides, by default, is the VMware Native Multipathing Plugin (NMP). The NMP is an extensible module that manages subplugins. There are two types of NMP subplugins: Storage Array Type Plugins (SATPs), and Path Selection Plugins (PSPs). SATPs and PSPs can be built-in and provided by VMware, or can be provided by a third party.
If more multipathing functionality is required, a third party can also provide an MPP to run in addition to, or as a replacement for, the default NMP.
VMware provides a generic Multipathing Plugin (MPP) called Native Multipathing Plugin (NMP).
What does NMP do?
  • Manages physical path claiming and unclaiming.
  • Registers and de-registers logical devices.
  • Associates physical paths with logical devices.
  • Processes I/O requests to logical devices:
    • Selects an optimal physical path for the request (load balance)
    • Performs actions necessary to handle failures and request retries.
  • Supports management tasks such as abort or reset of logical devices.

Permanent Device Loss (PDL)

PDL is a storage state indicating the permanent removal of a device.
– datastore is shown as unavailable in the Storage view
– storage adapter indicates the Operational State of the device is Lost Communication
– all paths to the device are marked as Dead
This state typically occures when a LUN at a storage array is unmapped or deleted. The VMkernel core storage stack knows the device is not coming back because the storage array informs the host of a PDL state through a SCSI command response. When all paths have PDL errors the removal is considered permanent.
PDL can occure two ways:
1. Planned PDL : storage device removal workflow (LUN unmounting/detaching from multiple hosts)
2. Unplanned PDL: storage device is removed at the storage array level without informing ESXi

The device may return from PDL, but there are no guarantee of data consistency.

In case LUN is removed without ESXi prepared there is a potential to enter an All Paths Down (APD) state.

All-Paths-Down (APD)

If PDL SCSI sense codes are not returned from a device (when unable to contact the storage array, or with a storage array that does not return the supported PDL SCSI codes), then the device is in an All-Paths-Down (APD) state, and the ESXi host continues to send I/O requests until the host receives a response.

As the ESXi host is not able to determine if the device loss is permanent (PDL) or transient (APD), it indefinitely retries SCSI I/O, including:

  • Userworld I/O (hostd management agent)
  • Virtual machine guest I/O

Note: If an I/O request is issued from a guest, the operating system should timeout and abort the I/O.

Due to the nature of an APD situation, there is no clean way to recover.

  • The APD situation needs to be resolved at the storage array/fabric layer to restore connectivity to the host.
  • All affected ESXi hosts may require a reboot to remove any residual references to the affected devices that are in an APD state.

 

Note: Performing a vMotion migration of unaffected virtual machines is not possible, as the management agents may be affected by the APD condition, and the ESXi host may become unmanaged. As a result, a reboot of an affected ESXi host forces an outage to all non-affected virtual machines on that host.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s