Select Page

StoneFly’s Scale-Out NAS Storage plug-in for Hadoop

Hadoop and big data go hand in hand, however many companies feel Hadoop comes up short on certain enterprise features they need. StoneFly’s Scale-Out NAS Storage offers an enterprise-grade alternative to the underlying Hadoop Distributed File System (HDFS) that enables you to keep data in a POSIX compatible storage environment while performing big data analytics with a Hadoop MapReduce Framework.

To overcome the traditional limitations of hardware-based storage, StoneFly™ has created an HDFS plug-in that enables MapReduce to run directly on StoneFly’s Scale-Out NAS Storage. This plugin uses Scale-Out NAS Storage volumes to run Hadoop jobs across multiple namespaces, allowing you to perform in-place analytics without migrating data in or out of HDFS.

StoneFly’s Scale-Out NAS Storage plug-in for Hadoop

Integrating the plugin into the Hadoop ecosystem goes well beyond MapReduce and HDFS. The Hadoop plug-in is compatible with Hadoop-based applications and supports technologies such as Hive, Pig HBase, Tez, Sqoop, Flume and more!

In this example we see four Scale-Out NAS Storage servers in a trusted storage pool, split between two zones for high-availability. A separate server runs the “Ambari” management console, the “Yarn Resource Manager” and the “Job History Server”.  This architecture eliminates the centralized metadata server and supports a fully fault-tolerant system with two or three way replication across a cluster that can scale anywhere from 2 to 128 nodes.

StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
To eliminate complex and time-consuming code re-writes StoneFly’s Scale-Out NAS Storage supports data access to several different mechanisms. File access with NFS or SMB, object access with swift and access via the Hadoop file-system API. You can use standard Linux tools and utilities such as Grep, Awk and Python, and take advantage of multi-protocol support including native StoneFly Scale-Out NAS Storage, NFS, SMB, HCFS and swift.
You also have the ability to add or shrink a cluster on the fly without impacting application availability and perform automatic data re-balancing. Let’s take a closer look at the plugin in action. From the “Ambari” management console you’re able to start all the services with a click of a button:
StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
We see there are a number of Hadoop services on the “Ambari” manager node. There are also four nodes in the StoneFly Scale-Out NAS Storage cluster.
StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
In the terminal window we see maps and reduces happening in real time on the Scale-Out NAS Storage nodes, and the management console shows us that all the work is complete.
StoneFly’s Scale-Out NAS Storage plug-in for Hadoop
StoneFly’s Scale-Out NAS Storage plug-in for Hadoop

The StoneFly Scale-Out NAS Storage plugin for Apache Hadoop makes it painless and cost effective to run analytics on data in Apache Hadoop, eliminating many of the challenges enterprises face when working with the Hadoop distributed file system.

Get in touch with us to learn more about StoneFly’s Scale-Out NAS Storage.

Want new articles before they get published?
Subscribe to our Awesome Newsletter.

Recent Posts

NAS Security: What to Expect and How to Secure your NAS

NAS Security: What to Expect and How to Secure your NAS

Network attached storage (NAS) systems are a permanent fixture in a corporate data center. Whether it’s setting up a file storage and sharing environment for your remote workforce, storing surveillance videos, financial records, and patient information, or running 4K...

Log Archiving: What Challenges to Expect and How to Overcome Them

Log Archiving: What Challenges to Expect and How to Overcome Them

Archiving logs is not a straight forward process. Storage administrators have to balance the regulatory requirement to archive logs, the data analytic needs, and the cost of long-term retention in a digital landscape that’s constantly threatened by ransomware attacks...

Physical vs Virtual Backup Appliances – A Comparison

Physical vs Virtual Backup Appliances – A Comparison

For data protection, physical backup appliances have been the preference of businesses because they offer simplified installations and high performance. With software-defined data centers and virtualization paving way for virtual backup appliances, do physical backup...

Cloud Disaster Recovery vs On-Premise – Which is Best?

Cloud Disaster Recovery vs On-Premise – Which is Best?

Disaster recovery solutions are purpose-built to ensure business continuity and allow you to recover your critical operations when disaster strikes. These solutions can be deployed on-premises (or onsite) and in the cloud. Whether it be on-premise or cloud disaster...

NAS Security: What to Expect and How to Secure your NAS

NAS Security: What to Expect and How to Secure your NAS

Network attached storage (NAS) systems are a permanent fixture in a corporate data center. Whether it’s setting up a file storage and sharing environment for your remote workforce, storing surveillance videos, financial records, and patient information, or running 4K...

Log Archiving: What Challenges to Expect and How to Overcome Them

Log Archiving: What Challenges to Expect and How to Overcome Them

Archiving logs is not a straight forward process. Storage administrators have to balance the regulatory requirement to archive logs, the data analytic needs, and the cost of long-term retention in a digital landscape that’s constantly threatened by ransomware attacks...

Physical vs Virtual Backup Appliances – A Comparison

Physical vs Virtual Backup Appliances – A Comparison

For data protection, physical backup appliances have been the preference of businesses because they offer simplified installations and high performance. With software-defined data centers and virtualization paving way for virtual backup appliances, do physical backup...

Cloud Disaster Recovery vs On-Premise – Which is Best?

Cloud Disaster Recovery vs On-Premise – Which is Best?

Disaster recovery solutions are purpose-built to ensure business continuity and allow you to recover your critical operations when disaster strikes. These solutions can be deployed on-premises (or onsite) and in the cloud. Whether it be on-premise or cloud disaster...

You May Also Like

Subscribe To Our Newsletter

Join our mailing list to receive the latest news, updates, and promotions from StoneFly.

Please Confirm your subscription from the email