Thursday, November 17, 2016

Links from Nov 17th vForum - Houston

Wednesday, February 24, 2016

Leveraging the Serengeti API with vSphere Big Data Extensions

I've been working with VMware Big Data Extensions more with a couple of customers as we look at providing Hadoop as a Service (HaaS) leveraging the Serengeti API. So what is Big Data Extensions (BDE), and what is the Serengeti API, and why would I use it?

What is it?

BDE is an orchestration layer for deploying and managing Hadoop clusters. It's deployed as an OVA and registered as a plug in in the vCenter web interface. What is unique about BDE is that it allows VMware administrators to manage Hadoop clusters as a single instance, and provides all of the under the hood orchestration. Is supports both deploying the cluster as well as scaling the cluster. BDE is available to all Enterprise + ESXi customers and supported by VMware. You can get it here:

http://www.vmware.com/go/download-bigdataextensions

While BDE is the commercially supported release it's built on a project that VMware released to the open source community call Serengeti. The open source Serengeti project can be found here:

https://github.com/vmware-serengeti

Why would I use it?

The BDE plugin is preconfigured to manage Hadoop clusters as a single instance, which is great if you are a VMware admin with access to vCenter. What happens when you need to offer HaaS to data scientists, and you don't really want to give them access to vCenter. That's where the Serengeti API comes in, we can use it to call out to BDE from another platform.

If you already leverage vRealize Automation you are in luck. VMware has pre-built a plugin pack for vRealize Automation and Orchestration to offer HaaS. You can get it from the solutions exchange here. But what happens if you use another portal? That's where the Serengeti API comes into play.

Dig into the API after the break


Monday, February 8, 2016

Installing a signed SSL cert for EMC ECS Object services

I was working thorough a proof of concept for a customer backing up Cassandra to an S3 object store this weekend. Since I already had the EMC ECS community edition running in the lab I had an S3 object store ready to go, but I needed to install a signed certificate on it to make my customers backup  of Cassandra data to object storage work.

If you are looking for an object store to play with EMC ECS is available free for non-production use. You can get it here, and the EMC CODE team has been nice enough to package up docker containers of the nodes.

Why do I need to do this?

You will want to leverage a signed cert any time your clients need secured access to the object store. You can use a cert from your internal certificate authority as long as it is added to the trusted root of your clients, or one from a public trusted certificate authority.

Let's dive into the technical details after the break.

Tuesday, December 22, 2015

Slow Deploy from Template

I have run into this same issue at a couple of different customers, it doesn't get much press considering how broad the impact of this issue. Normally this manifests itself as slow deploy from template operations at customers with larger vsphere environments, and the root cause is a vcenter bug relating to how deploy from template operations are performed. 


If we look at this KB article we can see that the way that vCenter does the clone operation changed with 5.1u2 and 5.5. In versions previous to this vCenter told the DESTINATION ESX host to perform the clone operation, starting in 5.1u2 and 5.5 vCenter began telling the SOURCE esx host to perform the clone operation. 

This doesn't cause issues in small environments where all ESX hosts have access to all datastores, but what happens in larger environments? In larger environments the performance will depend on where the template VM is registered. Many times the template VMs are registered in management clusters that can't see the production storage. If vCenter tells the source ESX host to execute the copy (where the template is registered) and it can't see the destination datastore it will copy the VMDK to the destination host over the MGMT VMkernel interface. This leads to slow clone times, and timeouts during multiple clone operations since the ESX host where the template is registered is doing every clone operation. 

In order to resolve the issue it is recommended to upgrade to 5.5u2d or 6.0 release of vCenter. This returns the behavior to the destination ESX host performing the copy operation. As long as all hosts have access to the mgmt datastore where the templates are registered then the destination ESX hosts can copy them directly over the storage network, many times with VAAI acceleration. During multiple deployments DRS assigns the new VMs to multiple ESX hosts, so the process doesn't overwhelm the single host where the templates are registered. 

So to resolve : 

1. Create a MGMT datastore available to all ESX hosts at the site.
2. Place all templates on MGMT datastore
3. Make sure vCenter is at 5.5u2d or greater.