Select Page

StoneFly’s Scale-Out NAS Storage plug-in for Hadoop

Hadoop and big data go hand in hand, however many companies feel Hadoop comes up short on certain enterprise features they need. StoneFly’s Scale-Out NAS Storage offers an enterprise-grade alternative to the underlying Hadoop Distributed File System (HDFS) that enables you to keep data in a POSIX compatible storage environment while performing big data analytics with a Hadoop MapReduce Framework.

To overcome the traditional limitations of hardware-based storage, StoneFly™ has created an HDFS plug-in that enables MapReduce to run directly on StoneFly’s Scale-Out NAS Storage. This plugin uses Scale-Out NAS Storage volumes to run Hadoop jobs across multiple namespaces, allowing you to perform in-place analytics without migrating data in or out of HDFS.

StoneFly’s Scale-Out NAS Storage plug-in for Hadoop

Integrating the plugin into the Hadoop ecosystem goes well beyond MapReduce and HDFS. The Hadoop plug-in is compatible with Hadoop-based applications and supports technologies such as Hive, Pig HBase, Tez, Sqoop, Flume and more!

In this example we see four Scale-Out NAS Storage servers in a trusted storage pool, split between two zones for high-availability. A separate server runs the “Ambari” management console, the “Yarn Resource Manager” and the “Job History Server”.  This architecture eliminates the centralized metadata server and supports a fully fault-tolerant system with two or three way replication across a cluster that can scale anywhere from 2 to 128 nodes.

StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
To eliminate complex and time-consuming code re-writes StoneFly’s Scale-Out NAS Storage supports data access to several different mechanisms. File access with NFS or SMB, object access with swift and access via the Hadoop file-system API. You can use standard Linux tools and utilities such as Grep, Awk and Python, and take advantage of multi-protocol support including native StoneFly Scale-Out NAS Storage, NFS, SMB, HCFS and swift.
You also have the ability to add or shrink a cluster on the fly without impacting application availability and perform automatic data re-balancing. Let’s take a closer look at the plugin in action. From the “Ambari” management console you’re able to start all the services with a click of a button:
StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
We see there are a number of Hadoop services on the “Ambari” manager node. There are also four nodes in the StoneFly Scale-Out NAS Storage cluster.
StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
In the terminal window we see maps and reduces happening in real time on the Scale-Out NAS Storage nodes, and the management console shows us that all the work is complete.
StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
The StoneFly Scale-Out NAS Storage plugin for Apache Hadoop makes it painless and cost effective to run analytics on data in Apache Hadoop, eliminating many of the challenges enterprises face when working with the Hadoop distributed file system.

Get in touch with us to learn more about StoneFly’s Scale-Out NAS Storage.

Recent Posts

What to Consider when Implementing DRaaS for ransomware protection

What to Consider when Implementing DRaaS for ransomware protection

According to Gartner, downtime costs more than $5,600 a minute; therefore, every business needs a reliable means of backup and disaster recovery. Disaster Recovery as a service (DRaaS) provides recovery in the cloud and is a cost-effective and highly efficient...

Downtime Cost: How to Calculate and Minimize it

Downtime Cost: How to Calculate and Minimize it

Downtime is bad for business. When applications, data and services are unavailable, business is disrupted, customers and stakeholders are unhappy, and regulatory authorities fine you. The true cost of unplanned downtime goes beyond lost revenue. How does one calculate...

Disaster Recovery as a Service (DRaaS) or On-Site DR Appliance?

Disaster Recovery as a Service (DRaaS) or On-Site DR Appliance?

Disaster Recovery-as-a-Service (DRaaS) delivers serverless recovery capabilities while disaster recovery (DR) appliances provide the on-prem secondary site that facilitates quick recovery. Which of the two is the best fit for you? Both deployment options have their...

FC SAN vs iSCSI SAN: What’s the Difference?

FC SAN vs iSCSI SAN: What’s the Difference?

Storage area networks (SANs) are a permanent fixture in corporate data centers used to host high-performance block-level structured workloads such as databases, applications, etc. If you’re familiar with SAN systems, then you’ve heard of Fibre Channel (FC) and iSCSI...

You May Also Like

Subscribe To Our Newsletter

Join our mailing list to receive the latest news, updates, and promotions from StoneFly.

Please Confirm your subscription from the email