Leveraging the Serengeti API with vSphere Big Data Extensions ~ HeathReynolds.com

I've been working with VMware Big Data Extensions more with a couple of customers as we look at providing Hadoop as a Service (HaaS) leveraging the Serengeti API. So what is Big Data Extensions (BDE), and what is the Serengeti API, and why would I use it?

What is it?

BDE is an orchestration layer for deploying and managing Hadoop clusters. It's deployed as an OVA and registered as a plug in in the vCenter web interface. What is unique about BDE is that it allows VMware administrators to manage Hadoop clusters as a single instance, and provides all of the under the hood orchestration. Is supports both deploying the cluster as well as scaling the cluster. BDE is available to all Enterprise + ESXi customers and supported by VMware. You can get it here:

http://www.vmware.com/go/download-bigdataextensions

While BDE is the commercially supported release it's built on a project that VMware released to the open source community call Serengeti. The open source Serengeti project can be found here:

https://github.com/vmware-serengeti

Why would I use it?

The BDE plugin is preconfigured to manage Hadoop clusters as a single instance, which is great if you are a VMware admin with access to vCenter. What happens when you need to offer HaaS to data scientists, and you don't really want to give them access to vCenter. That's where the Serengeti API comes in, we can use it to call out to BDE from another platform.

If you already leverage vRealize Automation you are in luck. VMware has pre-built a plugin pack for vRealize Automation and Orchestration to offer HaaS. You can get it from the solutions exchange here. But what happens if you use another portal? That's where the Serengeti API comes into play.

Dig into the API after the break

Through your portal you need to offer a service that authenticates with the Serengeti API, makes a call to create a new cluster, and passes the cluster information in a JSON. This let's you deploy a virtual hadoop cluster through any portal that support making calls to a RESTful API. Here are some examples of leveraging the API from curl to get you started:

Authenticating with the Serengeti API :

curl -c cookies.txt -i -k -d 'j_username=user@vsphere.local&j_password=pass' -X POST https://10.25.90.124:8443/serengeti/j_spring_security_check

Successful Response :

HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Set-Cookie: JSESSIONID=227537474CEF0143C1BD3C922F5D8838; Path=/serengeti; Secure Content-Length: 0 Date: Fri, 12 Feb 2016 13:22:39 GMT

Example of creating a cluster with a REST call, and succesful response :

curl -i -H "Content-type:application/json" -3 -b cookies.txt -X POST -d @default.json https://10.25.90.124:8443/serengeti/api/clusters --insecure --digest

Successful Response :

[9:37] HTTP/1.1 100 ContinueHTTP/1.1 202 Accepted Server: Apache-Coyote/1.1 Location: https://10.25.90.124:8443/serengeti/api/task/180 Content-Length: 0 Date: Fri, 12 Feb 2016 15:34:33 GMT

Example JSON for Apache Bigtop Cluster (also on github here ). To use with a cloud portal the specific properties of the cluster (node counts, sizes, name) would be variables passed through the portal interface.

	{ "name": "APITest",
	"externalHDFS": null,
	"distro": "bigtop",
	"distroVendor": "BIGTOP",
	"networkConfig": {
	"MGT_NETWORK": ["defaultNetwork"]
	},
	"topologyPolicy": "NONE",
	"nodeGroups": [{
	"name": "master",
	"roles": [
	"hadoop_namenode",
	"hadoop_resourcemanager"
	],
	"cpuNum": 2,
	"memCapacityMB": 3500,
	"swapRatio": 1.0,
	"storage": {
	"type": "SHARED",
	"shares": null,
	"sizeGB": 10,
	"dsNames": null,
	"splitPolicy": null,
	"controllerType": null,
	"allocType": null
	},
	"instanceNum": 1
	}, {
	"name": "worker",
	"roles": [
	"hadoop_datanode",
	"hadoop_nodemanager"
	],
	"cpuNum": 1,
	"memCapacityMB": 1024,
	"swapRatio": 1.0,
	"storage": {
	"type": "SHARED",
	"shares": null,
	"sizeGB": 10,
	"dsNames": null,
	"splitPolicy": null,
	"controllerType": null,
	"allocType": null
	},
	"instanceNum": 2
	}, {
	"name": "client",
	"roles": [
	"hadoop_client",
	"pig",
	"hive",
	"hive_server"
	],
	"cpuNum": 1,
	"memCapacityMB": 1024,
	"swapRatio": 1.0,
	"storage": {
	"type": "SHARED",
	"shares": null,
	"sizeGB": 10,
	"dsNames": null,
	"splitPolicy": null,
	"controllerType": null,
	"allocType": null
	},
	"instanceNum": 1
	}]
	}

Posted in:

HeathReynolds.com

Wednesday, February 24, 2016