2017-08-04

LUN masking and unmasking in vSphere 6.0


While studying for my VCP6-DCV I tried LUN masking in my lab. After doing a bunch of research on the web [1][2][3] I realized that something was not quite right. Having spent a few days working on this I have developed a procedure for cleanly masking and unmasking an iSCSI LUN using vSphere 6.0 and FreeNAS.

Environment

  • vCenter (Web Client Version 6.0.0 Build 3617395)
  •  ESXi 6.0.0 Update 3 (build-5050593) available here: Deploying Nested ESXi is even easier now with the ESXi Virtual Appliance. Host is a member of a HA/DRS cluster.
  • FreeNAS-11.0-U1 (aa82cc58d) running as a virtual machine on bare metal ESXi 6.0 U3.
  • iSCSI LUN hosts a datastore in a Storage DRS cluster. There is only one path to LUN. (I haven't reached iSCSI multi-pathing yet in my studies.) The significance is that this one path represents the 'last' path to the device as far as vSphere is concerned.

Problem


While trying to figure out how to make this work I would encounter these errors:

[root@TOPVCPESXi01:~] esxcli storage core claiming reclaim -d naa.6589cfc000000af147dc40524653c62f ; date
Unable to unclaim path vmhba33:C0:T4:L5 on device naa.6589cfc000000af147dc40524653c62f. Some paths may be left in an unclaimed state. You will need to claim them manually using the appropriate commands or wait for periodic path claiming to reclaim them automatically.

Thu Aug  3 17:17:07 UTC 2017

... or ...

[root@TOPVCPESXi01:~] esxcli storage filesystem unmount -l=Prod02-iSCSI_c62f
Volume 'Prod02-iSCSI_c62f' cannot be unmounted. Reason: Busy



... or ...

[root@TOPELHhost01:~] esxcli storage core adapter rescan --adapter=vmhba33
Rescan complete, however some dead paths were not removed because they were in use by the system. Please use the 'storage core device world list' command to see the VMkernel worlds still using these paths.



... or ...

[root@TOPVCPESXi01:~] esxcfg-mpath -L | grep "vmhba33:C0:T4:L5"

vmhba33:C0:T4:L5 state:dead (no device) vmhba33 0 4 5 (unclaimed) dead unknown iqn.1998-01.com.vmware:TOPVCPESXi01-1037b94c 00023d000001,iqn.2005-10.org.freenas.ctl.topolhnas01:iscsi-target5-p2-i1,t,2 

In some cases I would resort to trying to kill processes on the LUN only to end up with zombie processes which can only be removed by rebooting the host.

Procedure: LUN masking


1.       Identify name of datastore, e.g. "Prod02-iSCSI_c62f".
2.       In this case Datastore is a member of a Storage DRS cluster. Put Datastore into Maintenance Mode.
3.       Assume that datastore is mounted on ESXi host.
4.       Use vCenter to unmount datastore from ESXi host. Navigate to "Storage > [Datastore Name] > Manage > Connectivity and Multipathing". Select host. Click "Unmount".

5.       Disable.



6.       This results in the following error:

Description        Type        Date Time        Task        Target        User
Task: Disable multiple path        Information        8/4/2017 5:17:09 AM        Disable multiple path        topvcpesxi01.corp.ad.local        VSPHERE.LOCAL\Administrator

Datastore Prod02-iSCSI_c62f mounted on host topvcpesxi01.corp.ad.local was inaccessible. The condition was cleared and the datastore is now accessible        Information        Friday, August 4, 2017 4:32:40 AM                topvcpesxi01.corp.ad.local                com.vmware.vc.HA.VmcpStorageFailureCleared


This is not surprising as all hosts would be in scope for this action. Better to navigate to host and disable device/path. However, I didn't do that.

7.       Look at logs to see why the previous step failed.

grep -i "vmhba33" /var/log/vmkernel.log | tail -n 5
grep -i "vmhba33" /var/log/syslog.log | tail -n 5
grep -i "vmhba33" /var/log/vmkwarning.log | tail -n 5

[root@TOPVCPESXi01:~] grep -i "vmhba33" /var/log/vmkernel.log | tail -n 5
2017-08-04T07:49:03.934Z cpu0:33105)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x15 (0x43b5802be880, 0) to dev "naa.6589cfc000000322f83109f9045ae485" on path "vmhba33:C0:T3:L4" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x1a 0x0. Act:NONE
2017-08-04T07:49:03.938Z cpu0:33036)vmw_psp_mru: psp_mruSelectPathToActivateInt:346: Changing active path from NONE to vmhba33:C0:T4:L5 for device "Unregistered Device".
2017-08-04T07:49:03.939Z cpu0:33036)VMWARE SCSI Id: Id for vmhba33:C0:T4:L5
2017-08-04T07:49:03.939Z cpu0:33103)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x15 (0x43b5802be880, 0) to dev "naa.6589cfc000000af147dc40524653c62f" on path "vmhba33:C0:T4:L5" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x1a 0x0. Act:NONE
2017-08-04T09:17:11.419Z cpu1:34815 opID=896438a2)WARNING: NMP: nmpPathSetState:2148: Path "vmhba33:C0:T4:L5" could not be disabled as it is the last working path to the device.

[root@TOPVCPESXi01:~] grep -i "vmhba33" /var/log/syslog.log | tail -n 5
2017-08-04T08:10:15Z 2017-08-04 08: 10:15,595 Host Profiles[36919]: INFO: ISCSI(1501834215.595152):Gathering vnic binding info for: vmhba33
2017-08-04T08:10:25Z 2017-08-04 08: 10:25,831 Host Profiles[36938]: INFO: ISCSI(1501834225.831451):Gathering sendtarget discovery info for: vmhba33
2017-08-04T08:10:25Z 2017-08-04 08: 10:25,964 Host Profiles[36938]: INFO: ISCSI(1501834225.964861):Gathering discovered target info for: vmhba33
2017-08-04T08:10:26Z 2017-08-04 08: 10:26,446 Host Profiles[36938]: INFO: ISCSI(1501834226.446851):Gathering static target info for: vmhba33
2017-08-04T08:10:26Z 2017-08-04 08: 10:26,780 Host Profiles[36938]: INFO: ISCSI(1501834226.780191):Gathering vnic binding info for: vmhba33

[root@TOPVCPESXi01:~] grep -i "vmhba33" /var/log/vmkwarning.log | tail -n 5
2017-08-04T07:48:59.375Z cpu0:33207)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba33:CH:0 T:2 CN:0: iSCSI connection is being marked "ONLINE"
2017-08-04T07:48:59.381Z cpu0:33207)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba33:CH:0 T:3 CN:0: iSCSI connection is being marked "ONLINE"
2017-08-04T07:48:59.387Z cpu0:33207)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba33:CH:0 T:4 CN:0: iSCSI connection is being marked "ONLINE"
2017-08-04T07:48:59.393Z cpu0:33207)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba33:CH:0 T:5 CN:0: iSCSI connection is being marked "ONLINE"
2017-08-04T09:17:11.419Z cpu1:34815 opID=896438a2)WARNING: NMP: nmpPathSetState:2148: Path "vmhba33:C0:T4:L5" could not be disabled as it is the last working path to the device.
[root@TOPVCPESXi01:~]


8.    While this is interesting it turns out that this step is not necessary. Getting back on track: check the status of datastore:

esxcli storage filesystem list | grep -i "Prod02-iSCSI_c62f"

[root@TOPVCPESXi01:~] esxcli storage filesystem list | grep -i "Prod02-iSCSI_c62f"
                                                   Prod02-iSCSI_c62f        5973f769-b87d07eb-0cb0-000c297ef042    false  VMFS-unknown version             0             0
[root@TOPVCPESXi01:~]


.…and.…

esxcli storage vmfs extent list | grep -i "Prod02-iSCSI_c62f"

[root@TOPVCPESXi01:~] esxcli storage vmfs extent list | grep -i "Prod02-iSCSI_c62f"
Prod02-iSCSI_c62f        5973f769-b87d07eb-0cb0-000c297ef042              0  naa.6589cfc000000af147dc40524653c62f          1
[root@TOPVCPESXi01:~]


9.       Get NAA associated with datastore:

esxcfg-scsidevs -m | grep -i "Prod02-iSCSI_c62f"

[root@TOPVCPESXi01:~] esxcfg-scsidevs -m | grep -i "Prod02-iSCSI_c62f"
naa.6589cfc000000af147dc40524653c62f:1                           /vmfs/devices/disks/naa.6589cfc000000af147dc40524653c62f:1 5973f769-b87d07eb-0cb0-000c297ef042  0  Prod02-iSCSI_c62f
[root@TOPVCPESXi01:~]


.…or .…

esxcli storage vmfs extent list | grep -i "Prod02-iSCSI_c62f"

[root@TOPVCPESXi01:~] esxcli storage vmfs extent list | grep -i "Prod02-iSCSI_c62f"
Prod02-iSCSI_c62f        5973f769-b87d07eb-0cb0-000c297ef042              0  naa.6589cfc000000af147dc40524653c62f          1
[root@TOPVCPESXi01:~]


10.   Navigate to "Hosts and Clusters > [Hostname] > Manage > Storage > Storage Devices", select device
11.   Use NAA to get Runtime Value:

esxcfg-mpath -L | grep naa.6589cfc000000af147dc40524653c62f

[root@TOPVCPESXi01:~] esxcfg-mpath -L | grep naa.6589cfc000000af147dc40524653c62f
vmhba33:C0:T4:L5 state:active naa.6589cfc000000af147dc40524653c62f vmhba33 0 4 5 NMP active san iqn.1998-01.com.vmware:TOPVCPESXi01-1037b94c 00023d000001,iqn.2005-10.org.freenas.ctl.topolhnas01:iscsi-target5-p2-i1,t,2
[root@TOPVCPESXi01:~]

12.   Use NAA to show processes running on Datastore [4][5]. Do not attempt to kill these. The result is zombie processes which require a reboot.

[root@TOPVCPESXi01:~] esxcli storage core device world list | grep 6589cfc000000af147dc40524653c62f
naa.6589cfc000000af147dc40524653c62f     32776           1  idle0
naa.6589cfc000000af147dc40524653c62f     32872           1  OCFlush
naa.6589cfc000000af147dc40524653c62f     33280           1  helper51-0
naa.6589cfc000000af147dc40524653c62f     33282           1  helper51-2
naa.6589cfc000000af147dc40524653c62f     33286           1  helper51-6
naa.6589cfc000000af147dc40524653c62f     33287           1  helper51-7
naa.6589cfc000000af147dc40524653c62f     33872           1  sdrsInjector
naa.6589cfc000000af147dc40524653c62f     33895           1  storageRM
naa.6589cfc000000af147dc40524653c62f     34123           1  hostd-worker
naa.6589cfc000000af147dc40524653c62f     34448           1  hostd-worker
naa.6589cfc000000af147dc40524653c62f     34449           1  hostd-worker
naa.6589cfc000000af147dc40524653c62f     34813           1  hostd-worker
naa.6589cfc000000af147dc40524653c62f     34814           1  hostd-worker
naa.6589cfc000000af147dc40524653c62f     34815           1  hostd-worker
naa.6589cfc000000af147dc40524653c62f     34832           1  hostd-worker
naa.6589cfc000000af147dc40524653c62f     34833           1  hostd-worker
naa.6589cfc000000af147dc40524653c62f     36739           1  hostd-worker
[root@TOPVCPESXi01:~]



13. View claimrules before starting:

[root@TOPVCPESXi01:~] esxcli storage core claimrule list
Rule Class   Rule  Class    Type       Plugin     Matches                            XCOPY Use Array Reported Values  XCOPY Use Multiple Segments  XCOPY Max Transfer Size
----------  -----  -------  ---------  ---------  ---------------------------------  -------------------------------  ---------------------------  -----------------------
MP              0  runtime  transport  NMP        transport=usb                                                false                        false                        0
MP              1  runtime  transport  NMP        transport=sata                                               false                        false                        0
MP              2  runtime  transport  NMP        transport=ide                                                false                        false                        0
MP              3  runtime  transport  NMP        transport=block                                              false                        false                        0
MP              4  runtime  transport  NMP        transport=unknown                                            false                        false                        0
MP            101  runtime  vendor     MASK_PATH  vendor=DELL model=Universal Xport                            false                        false                        0
MP            101  file     vendor     MASK_PATH  vendor=DELL model=Universal Xport                            false                        false                        0
MP          65535  runtime  vendor     NMP        vendor=* model=*                                             false                        false                        0
[root@TOPVCPESXi01:~]


14. Create new claimrule

[root@TOPVCPESXi01:~] esxcli storage core claimrule add -r 500 -t location -A vmhba33 -C 0 -T 4 -L 5 -P MASK_PATH
[root@TOPVCPESXi01:~]


15. View new claimrule. Note Class = file.

[root@TOPVCPESXi01:~] esxcli storage core claimrule list
Rule Class   Rule  Class    Type       Plugin     Matches                                   XCOPY Use Array Reported Values  XCOPY Use Multiple Segments  XCOPY Max Transfer Size
----------  -----  -------  ---------  ---------  ----------------------------------------  -------------------------------  ---------------------------  -----------------------
MP              0  runtime  transport  NMP        transport=usb                                                       false                        false                        0
MP              1  runtime  transport  NMP        transport=sata                                                      false                        false                        0
MP              2  runtime  transport  NMP        transport=ide                                                       false                        false                        0
MP              3  runtime  transport  NMP        transport=block                                                     false                        false                        0
MP              4  runtime  transport  NMP        transport=unknown                                                   false                        false                        0
MP            101  runtime  vendor     MASK_PATH  vendor=DELL model=Universal Xport                                   false                        false                        0
MP            101  file     vendor     MASK_PATH  vendor=DELL model=Universal Xport                                   false                        false                        0
MP            500  file     location   MASK_PATH  adapter=vmhba33 channel=0 target=4 lun=5                            false                        false                        0
MP          65535  runtime  vendor     NMP        vendor=* model=*                                                    false                        false                        0
[root@TOPVCPESXi01:~]


16. Load new claimrule:

[root@TOPVCPESXi01:~] esxcli storage core claimrule load
[root@TOPVCPESXi01:~]


17. Reclaim device. I added the date command to show when this was executed. During a previous try this command worked after a while, but because I didn't record the time I couldn't estimate the elapsed time:

[root@TOPVCPESXi01:~] esxcli storage core claiming reclaim -d naa.6589cfc000000af147dc40524653c62f ; date
Fri Aug  4 09:40:20 UTC 2017
[root@TOPVCPESXi01:~]


18. View processes on device:

[root@TOPVCPESXi01:~] esxcli storage core device world list | grep 6589cfc000000af147dc40524653c62f | sed 's/     /*/g' | while read i;do echo ps | grep $i ;done
[root@TOPVCPESXi01:~]


19. View claimrule. Note claimrule for both Class = file and Class = runtime.

[root@TOPVCPESXi01:~] esxcli storage core claimrule list
Rule Class   Rule  Class    Type       Plugin     Matches                                   XCOPY Use Array Reported Values  XCOPY Use Multiple Segments  XCOPY Max Transfer Size
----------  -----  -------  ---------  ---------  ----------------------------------------  -------------------------------  ---------------------------  -----------------------
MP              0  runtime  transport  NMP        transport=usb                                                       false                        false                        0
MP              1  runtime  transport  NMP        transport=sata                                                      false                        false                        0
MP              2  runtime  transport  NMP        transport=ide                                                       false                        false                        0
MP              3  runtime  transport  NMP        transport=block                                                     false                        false                        0
MP              4  runtime  transport  NMP        transport=unknown                                                   false                        false                        0
MP            101  runtime  vendor     MASK_PATH  vendor=DELL model=Universal Xport                                   false                        false                        0
MP            101  file     vendor     MASK_PATH  vendor=DELL model=Universal Xport                                   false                        false                        0
MP            500  runtime  location   MASK_PATH  adapter=vmhba33 channel=0 target=4 lun=5                            false                        false                        0
MP            500  file     location   MASK_PATH  adapter=vmhba33 channel=0 target=4 lun=5                            false                        false                        0

MP          65535  runtime  vendor     NMP        vendor=* model=*                                                    false                        false                        0
[root@TOPVCPESXi01:~]


20. Rescan adapter:

[root@TOPVCPESXi01:~] esxcli storage core adapter rescan --adapter=vmhba33
[root@TOPVCPESXi01:~]


21. See if datastore is visible:

[root@TOPVCPESXi01:~] esxcli storage vmfs extent list | grep -i "Prod02-iSCSI_c62f"
[root@TOPVCPESXi01:~]


22. Check in vCenter.



23. Done LUN masking


Procedure: unmasking


1.       Start.
2.       Remove claimrule:

[root@TOPVCPESXi01:~] esxcli storage core claimrule remove -r 500
[root@TOPVCPESXi01:~]

 
3.       Show that claimrule (Class = file) has been removed from file.

[root@TOPVCPESXi01:~] esxcli storage core claimrule list
Rule Class   Rule  Class    Type       Plugin     Matches                                   XCOPY Use Array Reported Values  XCOPY Use Multiple Segments  XCOPY Max Transfer Size
----------  -----  -------  ---------  ---------  ----------------------------------------  -------------------------------  ---------------------------  -----------------------
MP              0  runtime  transport  NMP        transport=usb                                                       false                        false                        0
MP              1  runtime  transport  NMP        transport=sata                                                      false                        false                        0
MP              2  runtime  transport  NMP        transport=ide                                                       false                        false                        0
MP              3  runtime  transport  NMP        transport=block                                                     false                        false                        0
MP              4  runtime  transport  NMP        transport=unknown                                                   false                        false                        0
MP            101  runtime  vendor     MASK_PATH  vendor=DELL model=Universal Xport                                   false                        false                        0
MP            101  file     vendor     MASK_PATH  vendor=DELL model=Universal Xport                                   false                        false                        0
MP            500  runtime  location   MASK_PATH  adapter=vmhba33 channel=0 target=4 lun=5                            false                        false                        0
MP          65535  runtime  vendor     NMP        vendor=* model=*                                                    false                        false                        0
[root@TOPVCPESXi01:~]


 
4.       Load updated claimrule:

[root@TOPVCPESXi01:~] esxcli storage core claimrule load
[root@TOPVCPESXi01:~]

 
5.       Verify that claimrule (Class = runtime) has been removed from runtime.

[root@TOPVCPESXi01:~] esxcli storage core claimrule list Rule Class   Rule  Class    Type       Plugin     Matches                            XCOPY Use Array Reported Values  XCOPY Use Multiple Segments  XCOPY Max Transfer Size ----------  -----  -------  ---------  ---------  ---------------------------------  -------------------------------  ---------------------------  ----------------------- MP              0  runtime  transport  NMP        transport=usb                                                false                        false                        0 MP              1  runtime  transport  NMP        transport=sata                                               false                        false                        0 MP              2  runtime  transport  NMP        transport=ide                                                false                        false                        0 MP              3  runtime  transport  NMP        transport=block                                              false                        false                        0 MP              4  runtime  transport  NMP        transport=unknown                                            false                        false                        0 MP            101  runtime  vendor     MASK_PATH  vendor=DELL model=Universal Xport                            false                        false                        0 MP            101  file     vendor     MASK_PATH  vendor=DELL model=Universal Xport                            false                        false                        0 MP          65535  runtime  vendor     NMP        vendor=* model=*                                             false                        false                        0 [root@TOPVCPESXi01:~]

6.       Unclaim deleted rule:

[root@TOPVCPESXi01:~] esxcli storage core claiming unclaim -t location -A vmhba33 -C 0 -T 4 -L 5
[root@TOPVCPESXi01:~]

 
7.       Check status of Runtime Path:

[root@TOPVCPESXi01:~] esxcfg-mpath -L | grep "vmhba33:C0:T4:L5"
vmhba33:C0:T4:L5 state:active naa.6589cfc000000af147dc40524653c62f vmhba33 0 4 5 NMP active san iqn.1998-01.com.vmware:TOPVCPESXi01-1037b94c 00023d000001,iqn.2005-10.org.freenas.ctl.topolhnas01:iscsi-target5-p2-i1,t,2
[root@TOPVCPESXi01:~]

 
8.       Check status of Datastore:

[root@TOPVCPESXi01:~] esxcli storage filesystem list | grep -i "Prod02-iSCSI_c62f"
                                                   Prod02-iSCSI_c62f        5973f769-b87d07eb-0cb0-000c297ef042    false  VMFS-unknown version             0             0

 
9.       Check status of Datastore using different method:

[root@TOPVCPESXi01:~] esxcli storage vmfs extent list | grep -i "Prod02-iSCSI_c62f"
Prod02-iSCSI_c62f        5973f769-b87d07eb-0cb0-000c297ef042              0  naa.6589cfc000000af147dc40524653c62f          1
[root@TOPVCPESXi01:~]

 
11.   View in vCenter and verify that LUN is visible again.



12.   Rescan just to be sure:

[root@TOPVCPESXi01:~] esxcli storage core adapter rescan -A vmhba33
[root@TOPVCPESXi01:~]
[root@TOPVCPESXi01:~] esxcli storage filesystem list | grep -i "Prod02-iSCSI_c62f"
                                                   Prod02-iSCSI_c62f        5973f769-b87d07eb-0cb0-000c297ef042    false  VMFS-unknown version             0             0
[root@TOPVCPESXi01:~] esxcli storage vmfs extent list | grep -i "Prod02-iSCSI_c62f"
Prod02-iSCSI_c62f        5973f769-b87d07eb-0cb0-000c297ef042              0  naa.6589cfc000000af147dc40524653c62f          1
[root@TOPVCPESXi01:~]


13.   Done unmasking.


Conclusion



The fundamental flaw in my original approach was trying to do every thing from the host. Masking a LUN using only commands on the host doesn't seem possible, or at least I haven't figured out how to do it. vCenter is obviously doing extra work in the background to make this work.

References


[4] Remove obsolete datastore from an ESXi host fails
[5] This link uses a different approach to show processes on device: Re: Can't unmount datastore(s) - file system is busy error even though it's not in use



No comments: