Re: Configuring search term alternatives in vufind

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Configuring search term alternatives in vufind

Demian Katz

This falls under Solr’s domain – those suggestions are coming from Solr’s spell checker, which is configured here:

 

https://github.com/vufind-org/vufind/blob/master/solr/vufind/biblio/conf/solrconfig.xml#L483

 

It may be possible to adjust the behavior through some Solr settings. Additionally, Solr now has some new spell checking features which we plan to investigate for a future release, as discussed here:

 

https://vufind.org/jira/browse/VUFIND-745

 

Please let me know if I can be of any further assistance. I’m also copying this reply to the vufind-tech list; vufind-admins is largely used for automated notices, so more people will see it on vufind-tech, in case anybody else has some suggestions.

 

- Demian

 

From: Shepard, Thomas - 0050 - MITLL [mailto:[hidden email]]
Sent: Tuesday, June 20, 2017 2:02 PM
To: [hidden email]
Subject: [Vufind-admins] Configuring search term alternatives in vufind

 

How does one modify what values are chosen as vufind’s Search alternatives?

For example, what/where is the formula that results in suggesting “money” when we search for “monkey”?

Does this fall under solr’s domain or is there a file in vufind that can be modified?

Thanks,

Thom

 

Thom Shepard

MIT Lincoln Lab
244 Wood St.

Lexington, MA 01523

[hidden email]

781 981 0370

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: Configuring search term alternatives in vufind

Uwe Reh
Solr has several options to provide spell checking. But it's hard to
understand the complexity.
 > https://cwiki.apache.org/confluence/display/solr/Spell+Checking

Most of the variants are based on the
https://en.wikipedia.org/wiki/Levenshtein_distance. Since the distance
between "monKey" and "money" is just '1', you will hardly get better
suggestions without investing a lot of work in this scope.

In our installations, we are using a customized index with the
DirectSolrSpellChecker.

The main differences to original VuFind are:
* checking sequences of terms instead of single words
* no need to rebuild helper files on updates.
* a special index field to compare with. (filled only with names and titles)
Example:
> https://hds.hebis.de/ubffm/Search/Results?lookfor=monkey's+island

Is our solution better?
Well, I hope so, but I'm not sure.

We recognized two suboptimal effects.
1. Spell checking on sentences is expensive. It's necessary to solve a
Cartesian product over all variants of the given search terms. We have
to deactivate spell checking if a patron is looking for more than three
terms.
2. We have configured the rules for searching quite fuzzy. Therefore is
there often no real difference in the results between the original
search and the suggested variants.

Uwe


##########################
# Excerpts from our solrconfig.xml
#

>    <requestHandler name="edismax" class="solr.SearchHandler">
>       <lst name="defaults">
>          <str name="defType">edismax</str>
>          <str name="tie">0.1</str>
>          <str name="qf">allfields_unstemmed</str>
>          <str name="spellcheck">true</str>
>          <str name="spellcheck.collate">true</str>
>          <str name="spellcheck.extendedResults">true</str>
>          <str name="spellcheck.collateExtendedResults">true</str>
>          <str name="spellcheck.maxResultsForSuggest">1000</str>
>          <str name="spellcheck.maxCollations">2</str>
>          <str name="spellcheck.maxCollationTries">1000</str>
>          <str name="spellcheck.alternativeTermCount">5</str>
>       </lst>
>       <arr name="components">
>          <str>query</str>
>          <str>facet</str>
>          <str>mlt</str>
>          <str>stats</str>
>          <str>debug</str>
>          <str>elevator</str>
>          <str>spellcheck</str>
>       </arr>
>    </requestHandler>
>
>    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>       <str name="queryAnalyzerFieldType">xtext</str>
>       <lst name="spellchecker">
>          <str name="name">default</str>
>          <str name="classname">solr.DirectSolrSpellChecker</str>
>          <str name="field">spelling</str>
>          <int name="maxEdits">2</int>
>          <int name="minPrefix">1</int>
>          <int name="maxInspections">5</int>
>          <int name="minQueryLength">1</int>
>       </lst>
>       <float name="maxQueryFrequency">0.01</float>
>    </searchComponent>
##
# EOF
##


> *From:* Shepard, Thomas - 0050 - MITLL [mailto:[hidden email]]
> *Sent:* Tuesday, June 20, 2017 2:02 PM
> *To:* [hidden email]
> *Subject:* [Vufind-admins] Configuring search term alternatives in vufind
>
> How does one modify what values are chosen as vufind’s Search alternatives?
>
> For example, what/where is the formula that results in suggesting
> “money” when we search for “monkey”?
>
> Does this fall under solr’s domain or is there a file in vufind that can
> be modified?
>
> Thanks,
>
> Thom

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech