2017-07-26

Static routes for ESXi 6.0

While configuring one of my hosts for iSCSI I was encountering problems with network connectivity. An additional vmkernel interface was created for iSCSI on a VLAN/subnet different than the storage appliance. Pings from the ESXi host to the default gateway and from the router/switch back to the vmkernel IP were successful. But pings between the ESXi host and the storage appliance did not work. This ended up being a case of duh on my part for failing to realize that I needed a static route. A good night's sleep helped.

  • Storage appliance is on the 10.1.6.0/24 subnet.
  • The new vmkernel interface is on the 10.2.6.0/24 subnet. In my lab 10.2.0.0/16 is meant to represent a remote site.
  • All VLANs and subnets are defined on a single Cisco router/switch which means that all routes are directly connected.

Configuration of vmkernel interfaces. This does not change (long lines wrapped). iSCSI is defined on vmk1.

[root@VAPELHhost01:/] esxcfg-vmknic --list
Interface  Port Group/DVPort/Opaque Network        IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type                NetStack
vmk0       Management Network                      IPv4      10.2.4.10                               255.255.255.0   10.2.4.255      2c:59:e5:39:fb:14 1500    65535     true    STATIC              defaultTcpipStack
vmk1       0206-iSCSI                              IPv4      10.2.6.10                               255.255.255.0   10.2.6.255      00:50:56:62:37:5d 1500    65535     true    STATIC              defaultTcpipStack
[root@VAPELHhost01:/]


Routing table before fix:

[root@VAPELHhost01:~] esxcfg-route --list
VMkernel Routes:
Network          Netmask          Gateway          Interface
10.2.4.0         255.255.255.0    Local Subnet     vmk0
10.2.6.0         255.255.255.0    Local Subnet     vmk1
default          0.0.0.0          10.2.4.1         vmk0


My first attempt to fix the problem failed as I used the IP address of the iSCSI vmkernel interface as the destination.

Add (incorrect) static route using vmkernel interface as destination.

[root@VAPELHhost01:~] esxcfg-route --add 10.1.6.0/24 10.2.6.10
Adding static route 10.1.6.0/24 to VMkernel


New (incorrect) routing table. Interface vmk1 has a new route.

[root@VAPELHhost01:~] esxcfg-route --list
VMkernel Routes:
Network          Netmask          Gateway          Interface
10.1.6.0         255.255.255.0    10.2.6.10        vmk1
10.2.4.0         255.255.255.0    Local Subnet     vmk0
10.2.6.0         255.255.255.0    Local Subnet     vmk1
default          0.0.0.0          10.2.4.1         vmk0


New (incorrect) route fails.

[root@VAPELHhost01:~] esxcli network diag ping --host=TOPOLHNAS01is.local --interface=vmk1
   Summary:
         Duplicated: 0
         Host Addr: TOPOLHNAS01is.local
         Packet Lost: 100
         Recieved: 0
         Roundtrip Avg MS: -2147483648
         Roundtrip Max MS: 0
         Roundtrip Min MS: 999999000
         Transmitted: 3
   Trace:
[root@VAPELHhost01:~] esxcli network diag ping --host=TOPOLHNAS01is.local
   Summary:
         Duplicated: 0
         Host Addr: TOPOLHNAS01is.local
         Packet Lost: 100
         Recieved: 0
         Roundtrip Avg MS: -2147483648
         Roundtrip Max MS: 0
         Roundtrip Min MS: 999999000
         Transmitted: 3
   Trace:
[root@VAPELHhost01:~]


I realize my mistake and try again, this time using the IP address of the default gateway for the local iSCSI subnet.

Remove incorrect static route.

[root@VAPELHhost01:~] esxcfg-route --del 10.1.6.0/24 10.2.6.10
Deleting static route 10.1.6.0/24 from VMkernel


Add correct route, use default gateway of the local iSCSI subnet.

[root@VAPELHhost01:~] esxcfg-route --add 10.1.6.0/24  10.2.6.1
Adding static route 10.1.6.0/24 to VMkernel


Test new (corrected) static route. Success.

[root@VAPELHhost01:~] esxcli network diag ping --host=TOPOLHNAS01is.local --interface=vmk1
   Summary:
         Duplicated: 0
         Host Addr: TOPOLHNAS01is.local
         Packet Lost: 0
         Recieved: 3
         Roundtrip Avg MS: 524
         Roundtrip Max MS: 1036
         Roundtrip Min MS: 268
         Transmitted: 3
   Trace:
         Detail:
         Dup: false
         Host: 10.1.6.10
         ICMPSeq: 0
         Received Bytes: 64
         Roundtrip Time MS: 1037
         TTL: 63

         Detail:
         Dup: false
         Host: 10.1.6.10
         ICMPSeq: 1
         Received Bytes: 64
         Roundtrip Time MS: 268
         TTL: 63

         Detail:
         Dup: false
         Host: 10.1.6.10
         ICMPSeq: 2
         Received Bytes: 64
         Roundtrip Time MS: 268
         TTL: 63
[root@VAPELHhost01:/] esxcli network diag ping --host=TOPOLHNAS01is.local
   Summary:
         Duplicated: 0
         Host Addr: TOPOLHNAS01is.local
         Packet Lost: 0
         Recieved: 3
         Roundtrip Avg MS: 2606
         Roundtrip Max MS: 7304
         Roundtrip Min MS: 241
         Transmitted: 3
   Trace:
         Detail:
         Dup: false
         Host: 10.1.6.10
         ICMPSeq: 0
         Received Bytes: 64
         Roundtrip Time MS: 7305
         TTL: 63

         Detail:
         Dup: false
         Host: 10.1.6.10
         ICMPSeq: 1
         Received Bytes: 64
         Roundtrip Time MS: 274
         TTL: 63

         Detail:
         Dup: false
         Host: 10.1.6.10
         ICMPSeq: 2
         Received Bytes: 64
         Roundtrip Time MS: 242
         TTL: 63
[root@VAPELHhost01:/]


New routing table. Interface vmk1 points to default gateway of remote iSCSI subnet, i.e. 10.2.6.1.

[root@VAPELHhost01:/tmp/iscsi2] esxcfg-route --list
VMkernel Routes:
Network          Netmask          Gateway          Interface
10.1.6.0         255.255.255.0    10.2.6.1         vmk1
10.2.4.0         255.255.255.0    Local Subnet     vmk0
10.2.6.0         255.255.255.0    Local Subnet     vmk1
default          0.0.0.0          10.2.4.1         vmk0
[root@VAPELHhost01:/tmp/iscsi2]


The final step is to inform vCenter of the changes:

[root@VAPELHhost01:~] /etc/init.d/hostd restart
watchdog-hostd: Terminating watchdog process with PID 109254
hostd stopped.
Ramdisk 'hostd' with estimated size of 803MB already exists
hostd started.
[root@VAPELHhost01:~] /etc/init.d/vpxa restart
watchdog-vpxa: Terminating watchdog process with PID 109940
vpxa stopped.
[root@VAPELHhost01:~]


Then refresh Web Client.

vSphere Web Client showing updated routing table.

Version of ESXi.

[root@VAPELHhost01:~] vmware -v
VMware ESXi 6.0.0 build-5050593
[root@VAPELHhost01:~]


Done.


References:
  1. Configuring static routes for vmkernel ports on an ESXi host (2001426)
  2. Restarting the Management agents in ESXi (1003490)

2017-07-25

Unable to GPT format the disk "da3": gpart: geom \'da3\': Operation not permitted on FreeNAS 11

While configuring my FreeNAS virtual appliance for iSCSI I encountered a problem with re-creating device volumes [1]. A pop-up appeared in the FreeNAS GUI with the message:

[MiddlewareError: b'Unable to GPT format the disk "da3": gpart: geom \'da3\': Operation not permitted\n']

As there was no data I decided to try wiping the device (Storage > Volumes > View Disks, "Wipe"). Even a full wipe of the device didn't make a difference. I then tried removing the device and replacing it with a new (HDD) device and that made the symptom go away. I then encountered the same symptom on a different device so I tried a different approach. After doing some research [2][3][4][5] I came up with this:

topolhnas01mg# diskdevice=da8
topolhnas01mg# disksizeblocks=$( dmesg | grep -i "byte sectors" | grep -i "^$diskdevice[:].*" | head -1 | cut -d"(" -f 2 | cut -d" " -f 1 )
topolhnas01mg# partitionsizeblocks=34
topolhnas01mg# seekblocks=$( echo $disksizeblocks - $partitionsizeblocks | bc )
topolhnas01mg# sysctl kern.geom.debugflags=16
kern.geom.debugflags: 0 -> 16
topolhnas01mg# dd if=/dev/zero of=/dev/$diskdevice bs=512 count=$partitionsizeblocks
34+0 records in
34+0 records out
17408 bytes transferred in 0.002115 secs (8232582 bytes/sec)
topolhnas01mg# dd if=/dev/zero of=/dev/$diskdevice bs=512 seek=$seekblocks
dd: /dev/da8: end of device
35+0 records in
34+0 records out
17408 bytes transferred in 0.002137 secs (8145309 bytes/sec)
topolhnas01mg#


After running this the FreeNAS virtual appliance was rebooted and I was able to create the volumes. Later I was able to successfully format the iSCSI devices with VMFS.

This is a super quick way of overwriting the GPT partition table at the beginning and and of the disk device. The key to the speed is that only 34 blocks at the beginning and end of the HDD are overwritten. I suspect that this would work equally well for other operating systems (FreeBSD, Linux) that implement 'dmesg' but haven't tried. This method in no way attempts to address issues of bad blocks or media issues.

I suspect that the reason for this happening was due to upgrading and then down grading my physical ESXi host -- trying to get the "ESXi MAC Learn DvFilter"[5] working. That is another story.



[1] Ungraceful error if volume create fails if disk needs wipe
[2] This is where I got the idea of overwriting the GPT partition tables. (Unable to GPT format the disk ada0; unable to wipe ada0)
[3] Comment from DBronx highlighted the importance of "sysctl kern.geom.debugflags=16" for this to work. (Error: [MiddlewareError: Unable to GPT format the disk "ada0"])
[4] Another page that suggests overwriting the GPT partition table and clue as to size of partition table (GPT Rejected - how to wipe for ZFS?)
[5] First hint at how to overwrite GPT partition table at and of HDD device (How to completely wipe a hard drive?)
[5] ESXi Learnswitch – Enhancement to the ESXi MAC Learn DvFilter

2017-07-13

vSphere/vCloud home lab - new direction

About that Dell C6100. It's on hold permanently. There were a number of problems that didn't have easy solutions and I needed to get on with my VCP6-DCV studies.

First problem was noise. Because the C6100 is four servers crammed into one 2U package and you can stuff it with 28 HDDs there is a great need for cooling which means fans which means noise. I mean a lot of noise. Like freakishly loud. Not your normal 1U or 2U rack server noise. Worse. It was bad enough that I put it in a separate room and ran cables through the wall. Even on the other side of the wall the C6100 is very loud. I live in an high-rise apartment which means there is no basement option.

Second problem is lack of native support for ESXi 6.0. I did find an article on adding the necessary drivers ... blah, blah, blah. Even though it might make for a good exercise I didn't want to spend a lot of time on this.

Third problem was capacity. RAM and CPU were fine, but I made the mistake of cheaping out on the HDDs. My original idea was to use VSAN as the C6100 does not have RAID, only JBOD (just a bunch of disks). I didn't feel like buying more disks.

My solution was to to purchase a couple of HP ProLiant DL360e Gen8 servers, each with 2 x Intel E5-2430L, 96 GB RAM, 3 x 1.2 GB SAS 10kRPM HDDs. This included RAID and iLO Advanced (allows graphics mode via out of band management interface). I was fortunate that a local reseller of used computer equipment had a shipment of these units. A special shout-out to Micropeer.

The result is that I have been able to get further faster in setting up my vSphere lab. Because the DL360e Gen 8 supports ESXi 6.0 the install was easy with the bonus that ESXi 6.0 includes ESXi 6 as a guest OS which makes nesting ESXi easier.

So now I have a vSphere lab that will not make me deaf or go insane. More details in another post.