Select Page

 

StoneFly’s Scale-Out NAS Storage plug-in for Hadoop

26

December, 2016

Hadoop and big data go hand in hand, however many companies feel Hadoop comes up short on certain enterprise features they need. StoneFly’s Scale-Out NAS Storage offers an enterprise-grade alternative to the underlying Hadoop Distributed File System (HDFS) that enables you to keep data in a POSIX compatible storage environment while performing big data analytics with a Hadoop MapReduce Framework.

To overcome the traditional limitations of hardware-based storage, StoneFly™ has created an HDFS plug-in that enables MapReduce to run directly on StoneFly’s Scale-Out NAS Storage. This plugin uses Scale-Out NAS Storage volumes to run Hadoop jobs across multiple namespaces, allowing you to perform in-place analytics without migrating data in or out of HDFS.

Integrating the plugin into the Hadoop ecosystem goes well beyond MapReduce and HDFS. The Hadoop plug-in is compatible with Hadoop-based applications and supports technologies such as Hive, Pig HBase, Tez, Sqoop, Flume and more!

In this example we see four Scale-Out NAS Storage servers in a trusted storage pool, split between two zones for high-availability. A separate server runs the “Ambari” management console, the “Yarn Resource Manager” and the “Job History Server”.  This architecture eliminates the centralized metadata server and supports a fully fault-tolerant system with two or three way replication across a cluster that can scale anywhere from 2 to 128 nodes.

To eliminate complex and time-consuming code re-writes StoneFly’s Scale-Out NAS Storage supports data access to several different mechanisms. File access with NFS or SMB, object access with swift and access via the Hadoop file-system API. You can use standard Linux tools and utilities such as Grep, Awk and Python, and take advantage of multi-protocol support including native StoneFly Scale-Out NAS Storage, NFS, SMB, HCFS and swift.

You also have the ability to add or shrink a cluster on the fly without impacting application availability and perform automatic data re-balancing. Let’s take a closer look at the plugin in action. From the “Ambari” management console you’re able to start all the services with a click of a button:

We see there are a number of Hadoop services on the “Ambari” manager node. There are also four nodes in the StoneFly Scale-Out NAS Storage cluster.

In the terminal window we see maps and reduces happening in real time on the Scale-Out NAS Storage nodes, and the management console shows us that all the work is complete.

The StoneFly Scale-Out NAS Storage plugin for Apache Hadoop makes it painless and cost effective to run analytics on data in Apache Hadoop, eliminating many of the challenges enterprises face when working with the Hadoop distributed file system.

Get in touch with us to learn more about StoneFly’s Scale-Out NAS Storage.

Want new articles before they get published?
Subscribe to our Awesome Newsletter.

Related Post

Pin It on Pinterest