When choosing the database technology for an application, the most important question is whether to stick with the good old SQL databases, or follow the trend and choose NoSQL. The answer to this question is not as easy as the names (SQL or not) suggest…
There are lots of checklists out there trying to help you make the right choice, and they are very helpful for quickly shaping our minds around the topic. However, in my experience, this is more than a checklist topic, rather you need a deep understanding of both technologies. If you are from same era as I am, you have received education or gained experience with SQL databases, probably with none or little knowledge of NoSQL databases. For us, data manipulation and storage have always been tied to relational models, until we heard about the seemingly opposite word of NoSQL. It is just natural to first grasp the new concept in the same light as the old model with supposedly the biggest difference to be ‘not having strict schema’, which sounds just like what we needed. However, there is a lot more to it. We need to dive beyond the shape of stored data or the retrieval options such as ‘to JOIN or not’.
Understand NoSQL before making the choice
I come from a project that switched from SQL to NoSQL for the following reasons: flexible schema, affordable cloud support, easier data updates in continuous deployments, and following the new trend in technology. Although all of these are very valid reasons, we made one mistake: we only checked one side of the equation. We thought we just needed to learn the new technology as we were implementing it. However, we faced a huge shift of mindset, in addition to unforeseen difficulties for our specific problem domain. I’m not saying the decision was wrong, in fact we gained a lot of benefits from the switch, but we did get ourselves into a very difficult situation by only changing our data queries instead of adjusting some of our core design. All of this was said to introduce you to the main point of this article: understand NoSQL before making the choice. This will be a lengthy article, but it is the only way I can hopefully explain how this decision should be made and to help you along the road of implementing a NoSQL application (if that ends up being the right choice for your business).
4 Things to consider when choosing your database technology
There are four main concerns to consider when choosing your database technology: scalability, schema design, transaction support, and CAP theorem considerations. We’ll dig into each of them. Do note that, other than these four main considerations, there are many subtle differences that you need to be aware of. I will try to point to as many as the extent of this article allows.
In this post we focus on schema design. Please stay tuned for the next posts that will cover the rest of the aspects. I will update this article with links to the other posts in the coming weeks.
1. Schema design
Flexible schema design seems to be the first criteria that comes to developers’ minds when choosing between SQL and NoSQL. Although this is a totally valid criterion, this shouldn’t be the only deciding factor. As you’ll see, there are more design choices to be made other than a better fit for the data model. But, let’s focus on this aspect for now.
NoSQL databases allow for flexible schema, meaning that the stored data can vary without the database operation being impacted. On the opposite side, in SQL databases, the database is aware of the schema and doesn’t let application logic go beyond that defined schema. This is very useful if the application data does not conform to a strict schema, for example if there are many optional fields or if new fields are added dynamically.
Another benefit of the schema-less database is the flexibility in performing updates to the application data design without the need for updating data that is already stored. A challenge in SQL databases is updating the schema on the fly when data is being added to the system all the time. In NoSQL databases, this problem is non-existent since you don’t need to touch existing data as their schema is still valid. However, this is only from database’s point of view; the challenge in NoSQL databases is the ability to deal with all of these old data forms in the application logic.
NoSQL databases scale well due to their distributed nature and parallel processing. So, do keep in mind that when you’re searching through documents, several parallel processes are examining the documents they have access to, typically without talking to each other until they finish execution. This can affect how you think about writing your search queries or views. Also, the distributed storage makes it difficult to perform JOINs efficiently as it will require heavy communication between the nodes. For this reason, database JOINs are not fully supported in NoSQL databases. For example, MongoDB has added support for JOINs, but it can only do limited forms as part of data aggregation (see here). So, when a query requires analyzing stored documents, it would help if you are able to extract all you need from one document. Otherwise, the application will need to loop through the result set from one query and then perform subsequent queries to achieve the equivalent of SQL’s JOIN operation.
Other than performing JOINs in the application logic, the most common recommended approach is data de-normalization. This means that if you need information from two objects you need to copy some, or all, of the information from one object into the other one, so that you only need to query one of them and get all the info you need. For example, if you need to find all books written by Chinese authors, without an application-level join, your books should include not only the author name/id but the author’s nationality. The same example in SQL databases would be implemented in a normalized form, meaning only a key from an Author table would be present in the Book table, and the author’s information will not be repeated anywhere other than the Author’s table. The de-normalization method is basically pre-populating the database with the result of the JOIN operation that it will need, by designing the data in a way that related data exist together in one document. This can potentially lead to large documents, which increase the risk of conflicts as we will discuss later when exploring the CAP theorem. On the other hand, as long as related data exist together without repetition, it is desirable from a transactions point of view (this will also be discussed later). Another important side effect of data de-normalization is the existence of repeated data across multiple documents, which could cause inconsistency in data. This is one of the challenges that need to be either addressed or simply accepted as a possible outcome.
To most people, ‘flexible schema support’ translates to the ability to use JSON to store the data, as document-based NoSQL databases do. Let’s not forget that relational databases also support JSON data and can query on them. So, the fact that a data model fits well to JSON (flexible form), doesn’t necessarily mean that the application is a good fit for NoSQL. On the other hand, the strength of relational databases is mainly in its relational model and in the optimizations done by the engine, which is not fully utilized when working with JSON data. However, sometimes using the JSON support of SQL database is enough for the purpose of an application, if not all data in that application needs the flexibility of JSON.
To wrap up:
Considering all above points, while enjoying the benefit of NoSQL database’s flexible schema, we should note that we need proper design for the following two key aspects: the data retrieval and JOINs; and the application logic for how to deal with or maintain different old and future data forms that coexist in the database. So, although NoSQL data is schema-less, it should not be design-less!
We’ll get into the details of complications of CAP theorem, scalability, and transactions in the future posts. Please stay tuned! To be continued …
Please share your thoughts, questions, and relevant experience in the comments section.