vSAN on HPE Synergy Composeable Infrastructure – Part 2

Firstly, Apologies for the delay in getting the follow up to this series posted. I am getting all these together now to post in quicker succession. Hopefully will have all these posted around VMworld US!

So, this blog post is going to dive into the configuration and components of our vSAN setup using HPE Synergy, based on the overview in Part 1 where I mentioned this would be supporting VDI workloads.

Firstly, a little terminology to help you follow along with the rest of the blog articles:
Frame – Enclosure chassis which can hold up to 12 compute modules, and 8 interconnect modules.
Interconnect – Linkage between blade connectivity and datacentre such as Fibre Channel, Ethernet, and SAS.
Compute Module – Blade server containing CPU, Memory, PCI Expansion cards, and optionally Disk.
Storage Module – Storage chassis within above frame which can hold up to 40 SFF drives (SAS / SATA / SSD). Each storage module occupies 2 compute slots.
Stack – Between 1 and 3 Synergy frames combined together to form a single logical enclosure. Allows sharing of Ethernet uplinks to datacentre with Virtual Connect.
Virtual Connect – HPE technology to allow multiple compute nodes to share a smaller set of uplinks to datacentre networking infrastructure. Acts similar to a switch internal to the Synergy stack.

All of the vSAN nodes are contained within a single Synergy frame or chassis. Main reason behind this is that today, HPE do not support SAS connectivity across multiple frames within a stack, therefore the compute nodes accessing storage must be in the same frame as the storage module. You can mix the density of storage vs. compute within Synergy how you like. So, using a single storage module will leave 10 bays for compute.

Our vSAN configuration is set out as so:
1 x 12000 Synergy Frame with:

2 x SAS Interconnect Modules
2 x Synergy 20Gb Interconnect Link Modules
1 x D3940 Storage module with:

10 x 800GB SAS SSDs for Cache
30 x 3.84TB SATA SSDs for Capacity
2 x I/O Modules

10 x SY480 Gen 10 computer modules with:

768GB RAM (24 x 32GB DDR4 Sticks)
2 x Intel Xeon Gold 6140 CPUs (2.3Ghz x 18 cores)
2 x 300GB SSDs (for ESXi Installation)
Synergy 3820C CNA / Network Adapter
P416ie-m SmartArray Controller

The above frame is actually housed within a logical enclosure or stack containing 3 frames. This means the entire stack shares 2 redundant Virtual Connect Interconnects out to the physical switching infrastructure – but in our configuration these are in a different frame to that containing the vSAN nodes. The stack is interconnected with 20GB Interconnect modules to a pair of master Virtual Connects. For our environment, we have 4 x 40Gbe uplinks to the physical switching infrastructure per stack (2 per redundant Interconnect).

We keep our datacentre networking relatively simple, so all VLANs are trunked through the Virtual Connect switches directly to ESXi. We decided not to have any separation of networking, or internal networking configured within Virtual Connect. Therefore, vSAN replication traffic, and vMotion traffic will traverse out to the physical switching infrastructure, and hairpin back in, however this is of little concern given the bandwidth available to the stack.

That’s all for an overview of the hardware. But do let me know if there is any other detail you would like to see surrounding this topic! The next post will detail how a blade is ‘cabled’ to use the storage and network in a profile.

vSAN on HPE Synergy Composeable Infrastructure – Part 1

It’s been a while since I have posted, been pulled in many different directions with work priorities so blogging took a temporary side-line! I am now back and going to blog about a project I am currently working on to build out an extended VDI vSAN environment.

Our existing vSAN environment is running on DL380 G9 rackmounts, which whilst we had some initial teething issues have been particularly solid and reliable of late!

We are almost to the point of exhausting our CPU and Memory resources for this environment, along with about 60% utilized on the vSAN datastores across the 3 clusters. So with this it felt a natural fit to expand our vSAN environment as we continue the migration to Windows 10, and manage the explosive growth of the environment – aided by recently implementing Microsoft Azure MFA authentication vs 2-factor using a VPN connection.

As an organization, we are about to a refresh a number of HP Gen8 blades in our datacentre, and in looking at going to Gen10 knowing that this could be the last generation to support C7000 chassis, we thought it would be a good time to look at other solutions. This is where HPE Synergy composable infrastructure came in! After an initial purchase of 4 frames, and a business requirement causing us to expand this further – we felt that expanding vSAN could be a good fit into Synergy with the D3940 storage module.

Now we have the hardware in the datacentre and finally racked up, I am going to be going through a series of blogs on how vSAN looks in HPE Synergy composable infrastructure, our configuration, and some of the Synergy features / automation capabilities which make this easier to implement vs the traditional DL380 Gen9 rackmount hardware we have in place today. Stay tuned or follow my twitter handle for notifications for more on this series.

new vSAN cmdlet: Get-VSANObjectHealth

Been a while since I posted, so about time I caught up with some things I have been working on!

I will be posting a series of PowerCLI / Powershell cmdlets to assist with vSAN Day-2 operations. There is plenty out there for creating new clusters, but not much around for helping to troubleshoot / support production environments. Some of these have come from VMworld Europe Hackathon I recently attended in Europe which was a blast! I’ll do my best to credit others who helped contribute to the creation of these cmdlets also!

The first I created Get-VSANObjectHealth is to help obtain a state of the objects in a vSAN cluster. This is a great way to validate how things are in the environment, as it will give you the different status of objects, particularly if they are degraded or rebuilding. The idea of this cmdlet was to integrate into our host remediation script so I could verify that all the objects are healthy prior to moving onto the next host. I trust this verification in PowerCLI more so than VUM attempting to put a host into maintenance and then getting stuck.

Code is below and also posted on GitHub Here

How to use:

Details on the Switches:

– HealthyOnly
Will only return True if all objects are healthy, else will return False

– ShowObjectUUIDs
Will extend the query to include an array of ObjectUUIDs for each health category. Good if you need to investigate specific objects

– UseCachedInfo
Will use the cached vSAN Health data from the vCenter server rather than forcing an update from the cluster. Great if you need a quick check, but not recommended if you need a current picture of object health (such as just after exiting from Maintenance Mode)

Enjoy and please do let me know (either via comments here or GitHub) if there is any other enhancements you would like to see! More cmdlets to come soon.

vSAN Invalid State Component Metadata Fixed!

Just a quick note to follow on from my post regarding the invalid component metadata on vSAN. This is now fixed by VMware in the latest round of patches.

https://kb.vmware.com/kb/2145347

I recently upgraded my lab servers to the latest round of patches (end June 2017) and the errors which appeared after updating the disk firmware when I applied VUM patches. Nice!

Get vSAN Invalid State Metadata using PowerShell

Have had some ‘fun’ with our All-Flash vSAN clusters recently, after updating to 6.0 U3, then VMware certifying new HBA firmware / driver for our HPE DL380 servers. Every time I have updated / rebooted we end up with invalid metadata:

We’ve had a couple of tickets with VMware on this issue now, and a fix for this is still outstanding. It was scheduled for 6.0 U3 but failed a regression test. So for now when we patch / reboot we have to go fixing these!

The vSAN health page shown above only shows the affected component. On a Hybrid environment, you can remove the capacity disk from the diskgroup and re-add to resolve this, but for All-Flash you need to remove the entire diskgroup. Our servers have 2 x diskgroups per host, so we need to identify which diskgroup needs destroying.

To discover the diskgroup, you have to identify which capacity disk the affected component resides on. There is a VMware KB article for this – but it never worked on our vSAN nodes, so there was a different set of commands VMware support provided us to obtain these. VMware KB is HERE.

Now I’ve ended up doing this several times, and decided pull this into a PowerShell function to make life easier. It will return an object showing the host, Disk UUID, Disk Name & Affected components:

The script does require Posh-SSH, as it connects to the host over SSH to obtain the information. You can download this over at the PowerShell Gallery.

Here’s the code I put together:

To use this, just pass in the VMHostName and Root Credential (using Get-Credential) or just call the function and it will ask for those and run. Of course for this to work you will need to have SSH access enabled for your hosts, and afaik it will not work in lockdown mode.

With the output you can see which capacity disks / diskgroup is affected to delete and re-create. As you can see from the first screenshot I’ve had quite of few of these to do today! 🙁

vSAN is pretty darn good when it’s running – but we do have these challenges when any maintenance is required which is a little more involved than it seems. That said I think some of our challenges stem form using All-Flash, our small Hybrid test cluster has always been solid on the other hand!

Happy Hyper-converging!