When it comes to choosing a database for an organization or developers an obvious choice would be limiting it within the RDBMS (ORACLE, SQL Server, MySQL etc.) only. Until recently in 2009 when there was a new trend in the market, which changed the landscape of database engines. This new trend was named NoSQL. Lot of people took this term as “NO SQL” (derogatory) but in reality it was “Not Only SQL” which signifies that both can coexist & have their set of benefits & challenges. Today, companies like Google, Facebook, Twitter, Viber, Amazon and LinkedIn are using NoSQL in one way or other.
In this Article we will try to explore the world of NoSQL & see how it is impacting the database landscape. RDMS has set of properties called ACID (Atomicity, Consistency, Isolation and Durability) whereas NoSQL properties are referred as BASE (Basically Available, Soft state, Eventual consistency), we will try to explain this in this article later.
What is the need for NoSQL?
The Phrase “Necessity is the mother of invention” still holds true. There was a burning need for a storage that is not revolving around relational database & following are the needs that lead to invention of NoSQL:
- Growing Huge User Base: Few years ago only a few thousand users for an application was the limit & nobody was complaining. Today with the advent of Smartphones & Tablets the user base has grown many folds. Global Online Population is reaching ~2+ Billion who are spending ~35 Billion Hrs. Online. Even the traffic on these sites/apps are sporadic, a mobile app can gain million users overnight & can lose all within few days further couple that with seasonal swings (X’Mas). To accommodate this need there was a necessity of a database technology that can be easily scalable. Traditional RDMS were struggling in this area & NoSQL came with a solution.
- Growing Huge “Unstructured” Data Online: With the no. of online users growing rapidly, the amount of data generated was also increased. Now an application which is not able to process this data quickly & provide meaningful information to the end user will be losing the customers. Earlier most of the data used to be structured, but now the data is “unstructured” in the form of text, Log files, Click Streams, Blogs, Tweets, Audio and Video. RDBMS is not the right tool to capture this unstructured data, as the data need to be rigidly defined & has to be schema based. NoSQL is able to accommodate this need as well.
- High Cost of RDBMS: Unlike RDBMS, NoSQL will typically be using clusters of non-expensive servers to manage the data. RDBMS uses expensive servers that are proprietary for the storage.
- Cloud Computing: Almost all the applications & mobile apps use three-tier Internet architecture. Traditionally database layer has been RDBMS & hence centralized. With the advent of above needs, developers wanted the data to be decentralized. NoSQL database engine has been designed to be distributed in nature & hence suited better for today’s internet applications.
What is NoSQL?
First & Foremost we need to understand that NOSQL will not be replacing RDBMS but is a complimentary technology. Each will be having their own pros & cons & will be used for different type of applications. NoSQL is designed for the applications that have huge unstructured data & is distributed. Take for example the applications like Facebook & Twitter having millions of users accumulating Terabytes of data on daily basis. Unlike RDBMS, NoSQL database will have the following:
- NoSQL data has no fixed Schema.
- No relational joins between records.
- NoSQL can scale out by spreading the load over many non-expensive commodity servers(Physical & Virtual)
- Faster searches in some cases as against RDBMS.
There are primarily four different database technologies that are part of NoSQL. Document Store, Wide Column Store, Key-Value Store, Graph. We will be giving a brief overview of these four types in the next section.
Categories on NoSQL
As described above the current NoSQL world is divided in four broad categories:
Document Store Database
Document oriented database is as name suggests used for storing, retrieving & managing data in the form of documents. This is by far the most successful of all the NOSQL database engines. All the various document store databases assume encapsulating & encoding the data/information in standard form. Encodings used are XML, JSON, BSON and YAML. Binaries used are PDF, DOC and xls etc. example
Employee Name: “Navin Saini”
SAP ID: “1234567”
Mail ID: firstname.lastname@example.org
Each document can be addressed in the database using a unique key. This unique key can be a string or a path. Indexation is kept on these keys for faster search. Another cool feature offered is the set of APIs which can be used for fast querying within documents.
Available Document Store DB in market: MongoDB, CouchDB, CouchBase, RavenDB
Wide Column Store
Column Oriented database stores data tables as sections of columns of data as opposed to row of data in RDBMS. This type of storage is very well suited for data warehousing, CRM Systems & Library card catalogs where aggregation is used on large no. of similar data.
Basically, this technology has more to do with storing the information in the hard disk. In the RDBMS the data is stored row wise. Example 001:Navin Saini, 1234567, email@example.com;002:Rahul,23456, R@hcl.com. Whereas in Column Store database the information is stored like Navin Saini:001; Rahul:002. This kind of storage will make search on the database faster
Available Wide Column Store DB in Market: Cassandra, HBase, Accumulo
These are the simplest of NoSQL databases available. This is a schema-less construct which contains a pair of key with its associated value/data. This concept is very widely used in the programming world as well. As each record has a unique key associated with it, the search becomes faster. Example
Key1234_Name : Navin Saini
Key1234_ID : 123456
Key1234_mail : firstname.lastname@example.org
Key5678_Name : Rahul
Key5678_ID : 343544
Key5678_mail : R@hcl.com
Available Key-Value Store DB in market: Redis, Memcached, Riak, DynamoDB
Graph database is a database engine that uses nodes, edges and properties to store & retrieve the data. This type of database provides index-free adjacency. This means that every element contains a direct pointer to its adjacent element & hence no index lookups are required. These databases are faster than RDBMS when the data sets are associative. This type of data can scale up very easily as it doesn’t need expensive join operations
Available Key-Value Store DB in market: Neo4j, OrientDB, Titan
Challenges for NoSQL
NoSQL has generated lots of buzz in the technology world, but there are many obstacles to overcome before they can give a serious threat to RDBMS. Listing few of the obstacles below:
- Maturity of NoSQL Solutions: RDBMS has been around for a very long time & that is a very reassuring fact for the organizations while choosing a database
- Support & Collaboration: Most organizations & developers need a reassurance that if something goes wrong they will be able to get help from the online community & RDBMS vendors. Most of the NoSQL databases are open sources & support is limited.
- Analytics & BI: RDBMS has a very good compatibility & tools available for BI & analytics. NoSQL has no such tools for BI & Analytics support. Even making simple reports requires a significant programming effort.
- Developers Expertise: There are millions of developers in the world who are expert in RDBMS concepts & programming. Entire NoSQL workforce is in the learning phase right now.
NoSQL technology is gaining high traction within organizations & businesses that are struggling to cope up with the explosion of data & new data types. RDBS is struggling to meet the demands of the new world. That being said, NoSQL is not the answer to all the problems as highlighted in the beginning of the article there is enough space for both to coexist & each have their respective strengths & weakness.