|
SOLR |
ELASTICSEARCH |
Use Cases |
- Search for large bulk data sets, for example, healthcare (payer / provider), biopharma research, finance, and government
- Native unformatted record filter and search, such as e-commerce or customer-facing search
- Static data set searching
- Large bulk reprocessing
|
- Log analytics: enterprise log consumption and analysis or a replacement option for commercial off-the-shelf log analytics products
- Real-time dashboards for operational timeline or sales and marketing insights
- High-volume data streams with natural language content from social media and IoT streams
- Native unformatted record filter and search (e-commerce, customer)
|
Visualization Tools |
- Banana (Kibana port) can provide support up to Solr 6.x
- Apache Hue (mostly used in Hadoop deployments) – emerging functionality with Hue Search App
|
- Robust visualization development framework with Kibana
- Maintained and version-matched by Elastic
- Well-integrated with Grafana for analytics and monitoring
|
Cloud and Big Data |
- Cloud-based deployments rely heavily on management tools like Cloudera and Hortonworks
- Fully-hosted options are available through third-party vendors
- As an Apache project, Solr integrates well with other Apache products, especially those supported in Hadoop
|
- Fully-hosted and managed solutions are provided by all the major cloud infrastructure providers (Microsoft Azure, AWS, Google Cloud)
- Management tools are provided by the cloud hosting provider
- Elasticsearch Hadoop libraries allow for the integration of Hadoop components with Elasticsearch natively
|
Cognitive Search Capabilities and Integration |
- Learning to Rank (LTR) module is supported in Solr 6.4 or later
- As an Apache project, Solr integrates well with OpenNLP (but not an embedded component) for entity extraction and tagging to feed concept-based search
|
- Includes a Machine Learning component (with X-Pack)
- Allows for pattern recognition and time series forecasting (ML and Kibana)
- Learning to Rank (LTR) plugin supports machine-learning-driven relevancy tuning exercises
- Open NLP can be utilized in a similar fashion to Solr as an external component supporting cognitive search functions
|
Management and Operations |
- Overall, more difficult to manage (though Cloudera Manager helps with this in a Hadoop environment)
- APIs are not available (though Solr 7 supports metrics APIs, requires JMX)
- Scaling requires manual intervention for shard rebalancing (Solr 7 has an auto-scaling API giving some control over shard allocation and distribution)
|
- Easy to set up and scale
- Automatic shard rebalancing after node addition
- APIs provide ease of monitoring and state evaluation
- X-Pack provides out of the box resource dashboards (requires licensing from Elastic)
|
Development Architecture |
- Excellent pluggable architecture
- Plugins can be easily developed and integrated
- Fully open source with vast community support
- Tight integration with Lucene development
|
- More restrictive plugin architecture
- Plugins are not supported in hosted environments
- Recently became fully open source with Elasticsearch core and X-Pack (X-Pack code has been released as open source, but still requires commercial licensing to implement)
- Lags slightly in implementing new Lucene features
- Frequent point releases with feature additions
|
Cluster State Management |
- Zookeeper Quorum: minimum 3 nodes required; 5 to 7 recommended depending on the overall size of the cluster
|
- Master Nodes (proprietary solution): minimum 3 nodes required. They can exist as independent nodes or dual-role nodes with data nodes
|
Security |
- Implemented in 3 flavors: basic (username/password in Zookeeper), Hadoop authentication (LDAP), or Kerberos
- LDAP / Active Directory is not supported directly
- Custom plugins can be developed
|
- Implemented in 3 flavors: basic (username/password in Zookeeper), Hadoop authentication (LDAP), or Kerberos
- LDAP / Active Directory is not supported directly
- Custom plugins can be developed
|
Bulk Indexing Tools |
- Batch API operations
- Within Cloudera Hadoop: MapReduceIndexerTool (Solr 4.x); Lily HBase batch indexing; and Spark CrunchIndexerTool
- MapReduceIndexerTool (5.x) from Lucidworks
|
- Bulk API operations only
- Configuration modifications can be made to speed up initial bulk indexing
|
Near Real Time (NRT) Indexing
(not a comprehensive list) |
|
- Beats framework
- Logstash
- Ingest Nodes
- Kafka Connect Elasticsearch Sink
- Spark Streaming
- Apache NiFi/MiNiFi
- Accenture Aspire for unstructured data processing and enrichment
|
Analytics |
- Strong facet-based analytics
- JSON facets added to support more dynamic aggregations with analytic functions
- Stream Expressions are added in Solr 7 to support a streaming framework for parallel computation and result emissions for downstream processing
|
- Strong analytic capabilities with aggregations
- Supports analysis on top of aggregations (e.g. moving averages)
- Provides time-series analysis of continually added data (like logs or social media streams) for trend and efficacy insights
|
Nested Data Structures |
- Has the notion of parent-child document relationships
- These exist as separate documents within the index, limiting their aggregation functionality in deeply-nested data structures
|
- Deep nesting is well-supported
- Fully-structured JSON documents can be directly persisted into Elasticsearch
- Aggregations can be performed against nested structures easily
|
Query Operations |
- Mostly limited to query URI parameters, leading to complex queries (debuggable in Solr Admin)
- JSON API (Solr 7) introduced to allow for JSON based query expressions
- Request handlers can be simply defined in Solr configuration and Java to perform specific and complex tasks related to a given query use case
|
- Full-featured Query DSL for writing and expressing complex queries
- Limited to only JSON
- Custom request handlers require the development of a plugin. There is no notion of jar references from a custom endpoint as there is in Solr
|
API Interaction |
- SolrJ (Java) is the most well maintained and up-to-date version and is maintained as part of the Apache project
- Other Apache maintained APIs: Flare, PHP, Python, Perl
- Other language APIs exist but are community maintained, and often lag in functionality behind SolrJ (most notably the .NET API)
|
- Many APIs are developed and supported directly by Elastic (Java, JavaScript, Groovy, .NET, PHP, Perl, Python, Ruby)
- Other community APIs exist for Elasticsearch (e.g. C++, Erlang, Go, Haskell, Lua, Perl, R, etc.)
|