Lucene Stop Words Indexing Issue in Alfresco

Alfresco internally use Lucene search engine to index those alfresco contents which is uploaded and its meta-data. There are few issues I have encountered while dealing with Lucene search engine in alfresco because of which you may get unexpected results during search. So, here I am going to share my finding related to those issues which might help other developers.

Now if you check the typical Lucene query we can see few key words like “AND”,”OR”,”THE” etc… which helps to filter out search. You can consider them as where close of typical database query.

Now problem occurs when your search keyword contains those reserved keywords, since those keywords are reserved by Lucene it does not index those words from property or content of metadata so we does not able to search based on them.

For Instance

Title of Content1= “Lucene and Solr”

Title of Content2= “Lucene”

Title of Content3= “Solr”

TEXT:”Lucene is Solr”
internally drops the stopword “is” and therefore searches for “Lucene” followed by one or more stop words followed by “Solr” and will therefore NOT match.

The query
TEXT:”Lucene” AND TEXT:”is” AND TEXT:”Solr”
searches for all documents containing the three words “Lucene”, “is”, and “Solr”. As “is” is a stop word, it does not occur in the index and therefore the query returns NO result.

The query
TEXT:”Lucene” AND TEXT:”Solr”
searches for all documents containing the two words “Lucene” and “Solr”.

Now when you search for “Lucene and Solr” you will get unexpected result because you are trying to use reserved keywords of Lucene in your Search term.

You should check this discussion to get more idea on issue.

The term you are searching is a stop word that was dropped by the analyzer you use. For example, if your analyzer uses the StopFilter, a search for the word ‘the’ will always fail (i.e. produce no hits).

Summary: There are many ways you can leverage Lucence search capabilities but this is to give you clarity on what are the potential problems you can face while using Lucene.

Looking for quality Alfresco Web Hosting ? Look no further than Arvixe Web Hosting !

Also refer this forum post to get idea on how to deal with those stop words and what kind of issue it creates.

Tags: , , , , , , , , , , , , , , | Posted under Alfresco | RSS 2.0

Author Spotlight

mitpatoliya

I love opensource technologies working with those technologies from the time I have stepped in to the Software Industry. Alfresco CMS is my area of expertise. I have worked on various complex implementations which involved integration of Alfresco with other technologies, extensively worked with JBPM workflows and Webscripts.

Leave a Reply

Your email address will not be published. Required fields are marked *