I've been working with VMware Big Data Extensions more with a couple of customers as we look at providing Hadoop as a Service (HaaS) leveraging the Serengeti API. So what is Big Data Extensions (BDE), and what is the Serengeti API, and why would I use it?
What is it?
BDE is an orchestration layer for deploying and managing Hadoop clusters. It's deployed as an OVA and registered as a plug in in the vCenter web interface. What is unique about BDE is that it allows VMware administrators to manage Hadoop clusters as a single instance, and provides all of the under the hood orchestration. Is supports both deploying the cluster as well as scaling the cluster. BDE is available to all Enterprise + ESXi customers and supported by VMware. You can get it here:http://www.vmware.com/go/download-bigdataextensions
While BDE is the commercially supported release it's built on a project that VMware released to the open source community call Serengeti. The open source Serengeti project can be found here:
https://github.com/vmware-serengeti
Why would I use it?
The BDE plugin is preconfigured to manage Hadoop clusters as a single instance, which is great if you are a VMware admin with access to vCenter. What happens when you need to offer HaaS to data scientists, and you don't really want to give them access to vCenter. That's where the Serengeti API comes in, we can use it to call out to BDE from another platform.If you already leverage vRealize Automation you are in luck. VMware has pre-built a plugin pack for vRealize Automation and Orchestration to offer HaaS. You can get it from the solutions exchange here. But what happens if you use another portal? That's where the Serengeti API comes into play.
Dig into the API after the break