For any content management system search is the most important artifact. If documents are stored in CMS it is should be easily searchable based on various meta-data associated with documents. That is one the main reason for using content management system for many customers, other wise they prefer to store their documents in shared drive. Alfresco is also backed by one of such search engine which is solr. Alfresco was using Lucene as their search engine from the beginning and they later have adopted Solr as well to increase their search capabilities. So, first let’s get into brief introduction of Solr.
What is Solr?
Apache Solr is an open source enterprise search server and has been around long enough to be mature and power search on sites such as CNET and Netflix. It uses Apache Lucene as indexing and search engine. It is written in Java and provides plug-in interfaces for building extensions to the search server. It can be run in an application server such as Apache Tomcat and you can talk to Solr via HTTP and XML, with it responding with XML or for example JSON.
This is how Solr and Alfreso are integrated
Advantages of Solr
- Support for Cross-locale ordering for d:text and d:mltext properties
- Support using the Search Service for simple field-based faceting – faceting is after read access enforcement
- Alfresco nodes in a cluster can search against one or more Solr servers. This avoids each Alfresco node from running their separate Lucene indexing subsystems with independent local index files.We will see more about this in upcoming blogs
- Search performance can be scaled separately from the Alfresco repository (for example, two Solr instances for a four cluster node)
- These is a Solr built-in administration http://localhost:8080/solr/alfresco/admin/ for checking tokenization behavior, terms in the index, which helps in managing indexes.
- Fixed tokenization, in addition to local specific tokenization, to support better cross-language support
- Improved performance on the PATH implementation
- Evaluates READ access at query time
- No in-transaction indexing
Difference between Lucene and Solr can be found here.
Hope this will give you insight on how Solr is embedded in alfresco and how documents are being searched.
Further Reading :