SQL vs. NoSQL? Which database architecture should you use for your next project? Which one is the “best”? Some argue that one is always better than the other. But they are very different technologies that solve different problems.
Let’s take a look at them and see where they differ and where they overlap.
SQL databases support Structured Query Language (SQL), a language for working with data in relational databases. Broadly speaking, “SQL database” and “relational database” refer to the same technology.
A relational database stores data in tables. These tables have columns and rows. The columns define the attributes that each entry in a table can have. Each column has a name and a datatype. The rows are the records in the table.
For example, a table that holds customers might have columns that define the first name, last name, street address, city, state, postal code, and a unique identification code (ID). You could define the first six columns as strings. Or, the postal code could be an integer if all the clients are in the United States. The ID could be a string or an integer.
The relationships between the tables give SQL its power. Suppose you want to track your customer’s vehicles. Add a second table with vehicle ID, brand, model, and type. Then, create a third table that stores two columns: vehicle ID and customer ID. When you add a new vehicle, store its ID with the customer that owns it in this third table. Now, you can query the database for vehicles, for customers, for customers that own certain vehicles, and vehicles owned by customers. You can also easily have more than one vehicle per customer or more than one customer per vehicle.
Three common examples of SQL databases are SQLite, Oracle, and MySQL.
NoSQL database means many things. They’re databases that, well, don’t support SQL. Or they support a special dialect of SQL. Here’s a non-exhaustive list of the more popular NoSQL databases.
Key-Value (KV) databases store data in dictionaries. They can store huge amounts of data for fast insertion and retrieval.
In a KV database, all keys are unique. While the keys are often defined as strings, the values can be any datatype. They can even be different types in the same database. Common examples of values are JSON strings and Binary Large Objects (BLOBs).
Two popular examples of KV databases are Redis and Memcached.
A document store operates like a KV database but contains extra capabilities for manipulating values as documents rather than opaque types.
The structures of the documents in a store are independent of each other. In other words, there is no schema. But, document stores support operations that allow you to query based on the contents.
MongoDB and Couchbase are common examples of document stores.
Relational databases use rows to store their data in tables. What sets column-oriented databases apart from them is — as the name suggests — storing their information in columns. These databases support an SQL-like query language, but they store records and relations in columns of the same datatype. This makes for a scalable architecture. Column-oriented databases have very fast insertion and query times. They are suited for huge datasets.
Apache Cassandra and Hadoop HBase are column-oriented databases.
Graph databases work on the relationships between values. The values are free form, like the values in a document database. But you can connect them with user-defined links. This creates a graph of nodes and sets of nodes.
Queries operate on the graph. You can query on the keys, the values, or the relationships between the nodes.
Neo4j and FlockDB are popular graph databases.
SQL vs. NoSQL Databases: Which One?
So, when you compare SQL and NoSQL databases, you’re comparing one database technology with several others. Deciding which one is better depends on your data and how you need to access it.
Your Data Should Guide Your Decision
Is there a perfect fit for every data set? Probably not. But if you look at your data and how you use it, the best database becomes apparent.
Relational Data Problems
Can you break your data down into entities with logical relationships? A relational database is what you need, especially when you need to perform operations with the relationships.
Relational databases are best when you need data integrity. Properly designed, the constraints that relational databases place on datatypes and relations help guarantee integrity. NoSQL databases tend to be designed without explicit support for constraints, placing the onus on you.
Caching Data Problems
Caching is storing data for repeated access. You usually identify cached data with a single key. NoSQL databases excel at solving caching problems, while relational databases tend to be overkill.
Key-Value stores are an obvious choice for caching problems. Many websites use Redis and Memcached for data and session information.
But a document store that saves documents for historical purposes or reuse is an example of a caching solution, too.
Graph Data Problems
If a graph database stores data with relationships between data, why isn’t it a relational database? It’s because in a graph database relationships are just as important as the data. The relations have fields, names, and directions. Graph queries may include relationships and their names, types, or fields. Relation queries also use wildcards, which account for indirect relationships.
Suppose a database represents rooms in several hosting facilities. It stores buildings, rooms, racks, computers, and networking equipment. This is a relational problem since you have entities with specific relationships.
There could be a table for each entity in a relational database and then join tables representing the relationships between them. But now imagine a query for all the networking equipment in a given building. It has to look in the buildings, find the rooms, look in the rooms for racks, and finally collect all the equipment.
In a graph database, you could create a relation called “contains.” It would be a one-way relation reflecting that one node contains another. Each item in each facility is a node contained by another, except for the buildings. When you query the database for networking gear, a wildcard could combine relationships between the buildings, room, and racks. This query models real life, since you say “Give me all of the gear in building X.”
Scalability: SQL vs. NoSQL
Which technology scales better? NoSQL may have a slight edge here.
Relational databases scale vertically. In other words, data can’t extend across different servers. So, for large datasets, you need a bigger server. As your data increases in size, you need more drive space and more memory. You can share the load across clusters, but not data.
Column-oriented databases were created to solve this problem. They provide horizontal scalability with a relational model.
Key-Value, document, and graph databases also scale horizontally since it’s easier to distribute their datasets across a cluster of servers.
SQL vs. NoSQL: Which One?
SQL and NoSQL are effective technologies. SQL has been around for decades and has proven its worth in countless applications. NoSQL is a set of technologies that solve a variety of different problems. Each of them has its own advantages and tradeoffs.
The question is, which one is best suited for your application? Take the first step by carefully modeling your data and defining use-cases to learn how you need to store and retrieve it. Then, pick the right technology for your application.
Author – Eric Goebelbecker
Eric has worked in the financial markets in New York City for 25 years, developing infrastructure for market data and financial information exchange (FIX) protocol networks. He loves to talk about what makes teams effective (or not so effective!).