ElasticSearch Vs Solr | HCLTech

ElasticSearch Vs Solr
July 26, 2016

Overview

When it comes to big data search, our primary choices are between Solr and Elasticsearch. Both open source enterprise search platforms have the ability to perform full-text searches and faceted searches.

Background

Solr and ElasticSearch are competing search servers. Both ElasticSearch and Solr are built on top of Lucene, so many of their core features are identical.

Lucene is a search engine packaged together in a set of jar files. Many custom applications embed the Lucene jar files directly into their application and manually create and search their Lucene index through the Lucene APIs.

Solr and ES take those Lucene APIs, add features on top of them, and make the APIs accessible through an easy to deploy web server (like tomcat or jetty).

Wondering when to choose Solr and when ElasticSearch

It would be beneficial to take a deeper look and compare the two leading open source search engines built on top of Lucene.

Feature drill down - ElasticSearch Vs Solr

Basics

Many servers connected together form a cluster and a single instance is called a node. The main logical data structure for Solr is called the Collection, which is composed of Shards (Lucene indices).

A single Collection can have multiple Shards and Shards can live on different Nodes. Thus, a single Collection can be spread across multiple Nodes, giving a distributed environment. A Collection can have Replicas, which is an exact copy of the Shards, whose main purpose is to enable scaling and data duplication in case of node failures, thus providing high availability.

Advantage to Elastic

Additionally we can have multiple types of documents in a single index, so we can index documents of different index structure in a single Index. ElasticSearch is able to distinguish those Types during indexing as well as querying. In order to achieve the same with Solr, you would have to simulate that inside your application or develop a custom search component.

Configuration

In Solr, the configuration of all components is defined in the solrconfig.xml file and after each change, restart, or reload of Solr node, it is needed.

In ElasticSearch, the configuration is done in elasticsearch.yml file

Advantage to elastic

Many settings exposed by ElasticSearch can be changed on the live cluster for which the ElasticSearch nodes don’t require a restart.

Shard Rebalancing

As you add new machines, ElasticSearch will automatically load balance and move shards to new nodes in the cluster. This automatic shard rebalancing behavior does not exist in Solr.

Advantage to elastic

Nested Typing

Solr does not support nested typing; the document structure must be flat.

Advantage to elastic

ElasticSearch supports complex nested types.

Distributed Group By

Solr supports distributed group by (including grouped sorting, filtering, faceting, etc.) while ElasticSearch does not.

Advantage to Solr

Percolation Queries

ElasticSearch allows you to register certain queries that can generate notifications when the indexed documents match that query. This is really great for things like alerts.

Advantage to elastic

Community

Solr contributors and committers come from a number of different organizations, while Elasticsearch committers are from a single company.

Advantage to Solr

A number of Hadoop distributors have chosen Solr over Elasticsearch as their horses in the search race (e.g. Cloudera, Hortonworks, MapR being among them) even though they’ve also partnered with Elasticsearch.

Quick Comparison

Elasticsearch, unlike Solr was built with distribution in mind, to be EC2-friendly, meaning that Elasticsearch runs a search index on multiple servers, in a fail-safe and efficient way, and that’s quite a challenge.

ElasticSearch has been designed with the cloud era in mind. Even though some steps to make Solr cloud-ready have been taken, its initial architecture and design do not include it, so it will take more time to get Solr where Elasticsearch is out-of-the-box. Performance-wise, they are roughly the same. Operationally, Elasticsearch is a bit simpler to work with, it has just a single process. Solr, in its Elasticsearch-like fully distributed deployment mode known as SolrCloud, depends on Apache ZooKeeper. If you love monitoring and metrics, then Elasticsearch is the best choice. Notable users of Elasticsearch include Wikimedia, Facebook, StumbleUpon, Mozilla, Amadeus IT Group, Quora, Foursquare, Etsy, SoundCloud, GitHub, FDA, CERN, Stack Exchange, and Netflix.

Conclusion

Solr is search server for creating standard search applications, no massive indexing and no real time updates are required, but on the other hand Elasticsearch takes it to the next level with an architecture aimed at building modern real-time search applications. Percolation is an exciting and innovative feature. Elasticsearch is scalable and speedy, and if distributed indexing is needed then Elasticsearch would be the right choice.

If you’ve already invested a lot of time in Solr, stick with it, unless there are specific use cases that it just doesn’t handle well.

If you need a data store that can handle analytical queries in addition to text searching, Elasticsearch is a better choice.

At the end, Solr and ElasticSearch are very close to each other in feature sets, and it would be really difficult to make a decision on one or the other, without really knowing the exact requirements.

References

  1. https://sematext.com/blog/2012/08/23/solr-vs-elasticsearch-part-1-overview/>
  2. http://www.datanami.com/2015/01/22/solr-elasticsearch-question/
  3. https://en.wikipedia.org/wiki/Elasticsearch
  4. https://dzone.com/articles/solr-vs-elasticsearch
  5. https://www.loggly.com/blog/loggly-chose-elasticsearch-reliable-scalable-log-management/
  6. https://thinkbiganalytics.com/solr-vs-elastic-search/
  7. http://opensourceconnections.com/blog/2016/01/22/solr-vs-elasticsearch-relevance-part-two/

Get HCLTech Insights and Updates delivered to your inbox