Select Page

Comparing Hadoop and SQL

If you have been familiar to working with conventional SQL-databases, hearing someone talk about Hadoop can sound like a crazy mess. In this blog, we are going to try and simplify three of the many differences between conventional SQL sources and Hadoop. Which hopefully will provide context as to where each model is used.

Comparing Hadoop and SQL

Schema on Write vs. Schema on Read

The first difference we want to point-out is schema on-write vs. schema on-read. When we transfer data from SQL-database A to SQL-database B, and before we write the data to database B we need to have some information in hand. For instance, how to adapt the data from database A to fit the structure of database B.  We have to know what the structure of database B is.

Furthermore, we have to make sure that the data being transmitted meets the data types the SQL database is expecting. Database B will reject the data and spit out errors if we try to load something that does not meet its expectation. This is called Schema on-write.

On the other hand, Hadoop has a schema on-read approach. So, when data is written on the HDFS or the Hadoop Distributed File System, it’s just brought in without setting any gate-keeping rules. Then, we apply rules to the code that reads the data when we want to read the data. This code reads the data rather than pre-configuring their structure ahead of time. This concept of schema on-read vs. schema on-write has fundamental implications on how the data is stored in SQL vs. Hadoop, which leads us to our second difference.

How data is stored

In SQL, the data is stored in a logical form with inter-related tables and defined columns. In Hadoop, the data is a compressed file of either data (any type) or text. But, the moment the data enters into Hadoop, the data or file is replicated across multiple-nodes in the HDFS. For example, let’s assume that we’re loading Instagram data and we have a large Hadoop distributed file system cluster of 500 servers. Your Instagram data may be replicated across 50 of them, along with all the other profiles of Instagram users.

Hadoop keeps track of where all the copies of your profile are. This might seem like a waste of space, but it’s actually the secret to the massive-scalability magic in Hadoop. Which leads us to the third difference.

Big data in Hadoop vs. in SQL

A big data solution like Hadoop is architected for an unlimited number of servers. Back to our previous example, and if you’re searching Instagram data, and you want to see all the sentence that include the word “happy”; Hadoop will distribute the calculation of the search request across all the 50 servers where your profile data is replicated in the HDFS.

But, instead of each server conducting the exact same search, Hadoop will divide the workload, so that each server works on just a portion of your Instagram history. As each node finishes its assigned portion of history, a reducer program on the same node cluster adds up all the tallies and produces a consolidated list.

So what happens if any of the 50 servers break-down or has some sort of an issue while processing the request, should the response hold up because of the incomplete data set? The answer to this question is one of the main differences between SQL and Hadoop.

Now Hadoop would say no, and it would deliver an immediate response and finally it would have a consistent-answer.

SQL would say no, before releasing anything we must have consistency across all the nodes. This is called the two-phase commit. Now neither approach is wrong or right, as both approaches play an important role based on the data-type being used.

The Hadoop consistency methodology is far more realistic for reading constantly updating streams of unstructured data across thousands of servers. While the two-phase commit SQL databases methodology is well suited for rolling up and managing transactions and ensure we get the right answer.

Hadoop is so flexible that Facebook  created a program called “Hive” that allowed their team members who didn’t know Hadoop java code to write queries in standard SQL and mimic SQL behavior on-demand.  Today, some of the leading tools for analysis and data management are now beginning to natively write MapReduce programs and prepackaged schemas on-read so that organizations don’t have to hire expensive data-experts to get value from this powerful architecture.

To dive deeper into this subject, please download the following StoneFly white paper on Big Data & Hadoop.


Recent Posts

Compare Array vs Host vs Hypervisor vs Network-Based Replication

Compare Array vs Host vs Hypervisor vs Network-Based Replication

Array-based replication, host-based replication, hypervisor-based replication, and network-based replication are key data replication techniques. In this blog, we explore their features, use cases, advantages, and disadvantages. By understanding these methods, you can...

Comparing High Availability vs Fault Tolerance vs Disaster Recovery

Comparing High Availability vs Fault Tolerance vs Disaster Recovery

High availability, fault tolerance, and disaster recovery are pivotal concepts that enable organizations to achieve uninterrupted service delivery, protect critical data, and swiftly bounce back from unexpected incidents. By implementing these concepts effectively,...

BaaS vs RaaS vs DRaaS Comparison – Which is Best

BaaS vs RaaS vs DRaaS Comparison – Which is Best

Disaster recovery is critical for any organization, but it can be particularly challenging for small and medium-sized businesses that lack the resources to maintain secondary data centers. To address this challenge, disaster recovery vendors offer cloud-based backup...

How to Size Air-gapped and Immutable Storage for Veeam v12

How to Size Air-gapped and Immutable Storage for Veeam v12

Properly sizing storage is a critical aspect of any IT infrastructure, but it becomes even more crucial when it comes to backup and disaster recovery. Veeam v12 is an excellent software for data protection, but to function efficiently, it requires the right amount of...

You May Also Like

Subscribe To Our Newsletter

Join our mailing list to receive the latest news, updates, and promotions from StoneFly.

Please Confirm your subscription from the email