Alfresco internally use Lucene search engine to index those alfresco contents which is uploaded and its meta-data. There are few issues I have encountered while dealing with Lucene search engine in alfresco because of which you may get unexpected results during search. So, here I am going to share my finding related to those issues which might help other developers.
Now if you check the typical Lucene query we can see few key words like “AND”,”OR”,”THE” etc… which helps to filter out search. You can consider them as where close of typical database query.
Now problem occurs when your search keyword contains those reserved keywords, since those keywords are reserved by Lucene it does not index those words from property or content of metadata so we does not able to search based on them.
Title of Content1= “Lucene and Solr”
Title of Content2= “Lucene”
Title of Content3= “Solr”
TEXT:”Lucene is Solr”
internally drops the stopword “is” and therefore searches for “Lucene” followed by one or more stop words followed by “Solr” and will therefore NOT match.
TEXT:”Lucene” AND TEXT:”is” AND TEXT:”Solr”
searches for all documents containing the three words “Lucene”, “is”, and “Solr”. As “is” is a stop word, it does not occur in the index and therefore the query returns NO result.
TEXT:”Lucene” AND TEXT:”Solr”
searches for all documents containing the two words “Lucene” and “Solr”.
Now when you search for “Lucene and Solr” you will get unexpected result because you are trying to use reserved keywords of Lucene in your Search term.
You should check this discussion to get more idea on issue.
The term you are searching is a stop word that was dropped by the analyzer you use. For example, if your analyzer uses the StopFilter, a search for the word ‘the’ will always fail (i.e. produce no hits).
Summary: There are many ways you can leverage Lucence search capabilities but this is to give you clarity on what are the potential problems you can face while using Lucene.
Also refer this forum post to get idea on how to deal with those stop words and what kind of issue it creates.