searchspecs.yaml for dummies?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

searchspecs.yaml for dummies?

Library

Hello,

 

We have observed that when searching our ‘format’ field it only retrieves successfully case sensitive phrases.

 

For example:

 

“theses” is not retrieved.

“Theses” is retrieved.

 

“memorandum reports” is not retrieved

“Memorandum Reports” is retrieved.

 

“Memorandum” or “memorandum” does not retrieve anything.

 

The field definition in searchspecs.yaml is:

 

format:

   QueryFields:

    format:

      - [and, 50]

      - [onephrase, ~]

 

 

We would like the field to retrieve case-insensitive either words or phrases. What should be the correct definition? BTW, is there any “dummies guide” to searchspecs.yaml that explains the meaning of the syntax used? Thanks in advance.

 

Best regards,

 

---------------------------------

Xavier Berdaguer

Information Specialist

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
VuFind-General mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: searchspecs.yaml for dummies?

Uwe Reh
Hi Xavier,

sorry, I don't know a “dummies guide” to searchspecs.yaml.
But your problem isn't located in this file. Internally, the field
'format' is handled as a phrase (which respects cases).

Maybe you can use 'allfields_unstemmed', which contains also the content
of format. This field is indexed as words in lower case.

> format:
>    QueryFields:
>     allfields_unstemmed:
>       - [and, 50]
>       - [onephrase, ~]
The disadvantage of this hack is, you may get wrong hits.

A better solution could be, changing the way how Solr handles the field
'format' (../biblio/conf/schema.xml).
You have just to change the line ...
> from <field name="format" type="string" indexed="true" stored="true" multiValued="true"/>
> to   <field name="format" type="textProper" indexed="true" stored="true" multiValued="true"/>
... and reindex your system.
The disadvantage of this method is, beside the need of reindexing, that
you need to take care on schema.xml, while you upgrade VuFind.

Uwe






You have tree places that are important for your search results:
1. The import (SolrMarc) only data that is read into the index could be
found. ;-)
2. The query generation (searchspecs.yaml), which translates the patrons
input to an solr query.
3. The definition of the search fields. Which defines how the Date will
be stored/indexed (schema.xml)

Your actual situation is:
1. The 'format' is imported with lower and upper cases
2. the 'format' is stored and indexed

Am 28.03.2017 um 10:51 schrieb Library:

> Hello,
>
>
>
> We have observed that when searching our ‘format’ field it only
> retrieves successfully case sensitive phrases.
>
>
>
> For example:
>
>
>
> “theses” is not retrieved.
>
> “Theses” is retrieved.
>
>
>
> “memorandum reports” is not retrieved
>
> “Memorandum Reports” is retrieved.
>
>
>
> “Memorandum” or “memorandum” does not retrieve anything.
>
>
>
> The field definition in searchspecs.yaml is:
>
>
>
> format:
>
>    QueryFields:
>
>     format:
>
>       - [and, 50]
>
>       - [onephrase, ~]
>
>
>
>
>
> We would like the field to retrieve case-insensitive either words or
> phrases. What should be the correct definition? BTW, is there any
> “dummies guide” to searchspecs.yaml that explains the meaning of the
> syntax used? Thanks in advance.
>
>
>
> Best regards,
>
>
>
> ---------------------------------
>
> Xavier Berdaguer
>
> Information Specialist
>
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
>
> _______________________________________________
> VuFind-General mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/vufind-general
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
VuFind-General mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: searchspecs.yaml for dummies?

Demian Katz

As Uwe says, the key here is that when Solr stores a field as type "string," then it can only be searched by exact matching; the string type does not perform any normalization or tokenization. However, simply changing the format field to textProper will cause unwanted side effects: since format is primarily used as a facet field, and facets display the fully processed text from the Solr index rather than the initial raw input, changing the type will break format facets (everything will be lower-case, and all the words of multi-word phrases will show up as independent facet values). My recommendation, if you want an independently-searchable format facet, would be to add a copyField directive to the Solr schema that copies the current format value to a dynamic field called format_txt_mv (the suffix _txt_mv will make the field work without requiring you to modify the schema further). Then you can customize searchspecs.yaml to use format_txt_mv instead of format for searching. This should work better.


However, I'm also interested in knowing more about your use case -- if you tell me more about why you are trying to search your format field like this, perhaps there is a better/simpler solution to the higher-level problem.


- Demian



From: Uwe Reh <[hidden email]>
Sent: Tuesday, March 28, 2017 6:37 AM
To: [hidden email]
Subject: Re: [VuFind-General] searchspecs.yaml for dummies?
 
Hi Xavier,

sorry, I don't know a “dummies guide” to searchspecs.yaml.
But your problem isn't located in this file. Internally, the field
'format' is handled as a phrase (which respects cases).

Maybe you can use 'allfields_unstemmed', which contains also the content
of format. This field is indexed as words in lower case.

> format:
>    QueryFields:
>     allfields_unstemmed:
>       - [and, 50]
>       - [onephrase, ~]
The disadvantage of this hack is, you may get wrong hits.

A better solution could be, changing the way how Solr handles the field
'format' (../biblio/conf/schema.xml).
You have just to change the line ...
> from <field name="format" type="string" indexed="true" stored="true" multiValued="true"/>
> to   <field name="format" type="textProper" indexed="true" stored="true" multiValued="true"/>
... and reindex your system.
The disadvantage of this method is, beside the need of reindexing, that
you need to take care on schema.xml, while you upgrade VuFind.

Uwe






You have tree places that are important for your search results:
1. The import (SolrMarc) only data that is read into the index could be
found. ;-)
2. The query generation (searchspecs.yaml), which translates the patrons
input to an solr query.
3. The definition of the search fields. Which defines how the Date will
be stored/indexed (schema.xml)

Your actual situation is:
1. The 'format' is imported with lower and upper cases
2. the 'format' is stored and indexed

Am 28.03.2017 um 10:51 schrieb Library:
> Hello,
>
>
>
> We have observed that when searching our ‘format’ field it only
> retrieves successfully case sensitive phrases.
>
>
>
> For example:
>
>
>
> “theses” is not retrieved.
>
> “Theses” is retrieved.
>
>
>
> “memorandum reports” is not retrieved
>
> “Memorandum Reports” is retrieved.
>
>
>
> “Memorandum” or “memorandum” does not retrieve anything.
>
>
>
> The field definition in searchspecs.yaml is:
>
>
>
> format:
>
>    QueryFields:
>
>     format:
>
>       - [and, 50]
>
>       - [onephrase, ~]
>
>
>
>
>
> We would like the field to retrieve case-insensitive either words or
> phrases. What should be the correct definition? BTW, is there any
> “dummies guide” to searchspecs.yaml that explains the meaning of the
> syntax used? Thanks in advance.
>
>
>
> Best regards,
>
>
>
> ---------------------------------
>
> Xavier Berdaguer
>
> Information Specialist
>
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsdm.link%2Fslashdot&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=yWrHNDkWBcpwACRl4lTwkz8YvsvwlN6hgNSP8tQL0Mg%3D&reserved=0
>
>
>
> _______________________________________________
> VuFind-General mailing list
> [hidden email]
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fvufind-general&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=E19y4Eij%2FTglxjWN6wa5339qtQQCOk4p6xrFe1iuzeY%3D&reserved=0
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsdm.link%2Fslashdot&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=yWrHNDkWBcpwACRl4lTwkz8YvsvwlN6hgNSP8tQL0Mg%3D&reserved=0
_______________________________________________
VuFind-General mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fvufind-general&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=E19y4Eij%2FTglxjWN6wa5339qtQQCOk4p6xrFe1iuzeY%3D&reserved=0

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
VuFind-General mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: searchspecs.yaml for dummies?

Library

Thanks for your suggestion, Demian, and I will implement it. The use case is that we like the capability to filter/limit a search by format. For example, if you are in the advanced search screen, you can search by some words in the title and use a second search box to specify that you want the results of a certain format. The point is that the string field forces to type a case-sensitive format, which can be confusing an unexpected for the user, who does not know it, so there would be no results.

 

Best regards,

 

Xavier

 

From: Demian Katz [mailto:[hidden email]]
Sent: 28 March 2017 16:45
To: Uwe Reh; [hidden email]
Subject: Re: [VuFind-General] searchspecs.yaml for dummies?

 

As Uwe says, the key here is that when Solr stores a field as type "string," then it can only be searched by exact matching; the string type does not perform any normalization or tokenization. However, simply changing the format field to textProper will cause unwanted side effects: since format is primarily used as a facet field, and facets display the fully processed text from the Solr index rather than the initial raw input, changing the type will break format facets (everything will be lower-case, and all the words of multi-word phrases will show up as independent facet values). My recommendation, if you want an independently-searchable format facet, would be to add a copyField directive to the Solr schema that copies the current format value to a dynamic field called format_txt_mv (the suffix _txt_mv will make the field work without requiring you to modify the schema further). Then you can customize searchspecs.yaml to use format_txt_mv instead of format for searching. This should work better.

 

However, I'm also interested in knowing more about your use case -- if you tell me more about why you are trying to search your format field like this, perhaps there is a better/simpler solution to the higher-level problem.

 

- Demian

 


From: Uwe Reh <[hidden email]>
Sent: Tuesday, March 28, 2017 6:37 AM
To: [hidden email]
Subject: Re: [VuFind-General] searchspecs.yaml for dummies?

 

Hi Xavier,

sorry, I don't know a “dummies guide” to searchspecs.yaml.
But your problem isn't located in this file. Internally, the field
'format' is handled as a phrase (which respects cases).

Maybe you can use 'allfields_unstemmed', which contains also the content
of format. This field is indexed as words in lower case.

> format:
>    QueryFields:
>     allfields_unstemmed:
>       - [and, 50]
>       - [onephrase, ~]
The disadvantage of this hack is, you may get wrong hits.

A better solution could be, changing the way how Solr handles the field
'format' (../biblio/conf/schema.xml).
You have just to change the line ...
> from <field name="format" type="string" indexed="true" stored="true" multiValued="true"/>
> to   <field name="format" type="textProper" indexed="true" stored="true" multiValued="true"/>
... and reindex your system.
The disadvantage of this method is, beside the need of reindexing, that
you need to take care on schema.xml, while you upgrade VuFind.

Uwe






You have tree places that are important for your search results:
1. The import (SolrMarc) only data that is read into the index could be
found. ;-)
2. The query generation (searchspecs.yaml), which translates the patrons
input to an solr query.
3. The definition of the search fields. Which defines how the Date will
be stored/indexed (schema.xml)

Your actual situation is:
1. The 'format' is imported with lower and upper cases
2. the 'format' is stored and indexed

Am 28.03.2017 um 10:51 schrieb Library:


> Hello,
>
>
>
> We have observed that when searching our ‘format’ field it only
> retrieves successfully case sensitive phrases.
>
>
>
> For example:
>
>
>
> “theses” is not retrieved.
>
> “Theses” is retrieved.
>
>
>
> “memorandum reports” is not retrieved
>
> “Memorandum Reports” is retrieved.
>
>
>
> “Memorandum” or “memorandum” does not retrieve anything.
>
>
>
> The field definition in searchspecs.yaml is:
>
>
>
> format:
>
>    QueryFields:
>
>     format:
>
>       - [and, 50]
>
>       - [onephrase, ~]
>
>
>
>
>
> We would like the field to retrieve case-insensitive either words or
> phrases. What should be the correct definition? BTW, is there any
> “dummies guide” to searchspecs.yaml that explains the meaning of the
> syntax used? Thanks in advance.
>
>
>
> Best regards,
>
>
>
> ---------------------------------
>
> Xavier Berdaguer
>
> Information Specialist
>
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsdm.link%2Fslashdot&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=yWrHNDkWBcpwACRl4lTwkz8YvsvwlN6hgNSP8tQL0Mg%3D&reserved=0
>
>
>
> _______________________________________________
> VuFind-General mailing list
> [hidden email]
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fvufind-general&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=E19y4Eij%2FTglxjWN6wa5339qtQQCOk4p6xrFe1iuzeY%3D&reserved=0
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsdm.link%2Fslashdot&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=yWrHNDkWBcpwACRl4lTwkz8YvsvwlN6hgNSP8tQL0Mg%3D&reserved=0
_______________________________________________
VuFind-General mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fvufind-general&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=E19y4Eij%2FTglxjWN6wa5339qtQQCOk4p6xrFe1iuzeY%3D&reserved=0


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
VuFind-General mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: searchspecs.yaml for dummies?

Demian Katz

Xavier,


Do you have an unusually long list of formats? By default, VuFind's advanced search uses a multi-select list on the advanced search to allow formats to be applied. In many situations, that makes more sense than requiring the user to type anything... but of course I understand that if your use of the format field involves so many options that search is more practical than browse, going ahead with the changes discussed is probably the way to go. I just want to be sure you're not going down this path because the multi-select isn't showing up on your installation or something like that!


- Demian




From: Library <[hidden email]>
Sent: Wednesday, March 29, 2017 3:13 AM
To: Demian Katz; 'Uwe Reh'; [hidden email]
Subject: RE: [VuFind-General] searchspecs.yaml for dummies?
 

Thanks for your suggestion, Demian, and I will implement it. The use case is that we like the capability to filter/limit a search by format. For example, if you are in the advanced search screen, you can search by some words in the title and use a second search box to specify that you want the results of a certain format. The point is that the string field forces to type a case-sensitive format, which can be confusing an unexpected for the user, who does not know it, so there would be no results.

 

Best regards,

 

Xavier

 

From: Demian Katz [mailto:[hidden email]]
Sent: 28 March 2017 16:45
To: Uwe Reh; [hidden email]
Subject: Re: [VuFind-General] searchspecs.yaml for dummies?

 

As Uwe says, the key here is that when Solr stores a field as type "string," then it can only be searched by exact matching; the string type does not perform any normalization or tokenization. However, simply changing the format field to textProper will cause unwanted side effects: since format is primarily used as a facet field, and facets display the fully processed text from the Solr index rather than the initial raw input, changing the type will break format facets (everything will be lower-case, and all the words of multi-word phrases will show up as independent facet values). My recommendation, if you want an independently-searchable format facet, would be to add a copyField directive to the Solr schema that copies the current format value to a dynamic field called format_txt_mv (the suffix _txt_mv will make the field work without requiring you to modify the schema further). Then you can customize searchspecs.yaml to use format_txt_mv instead of format for searching. This should work better.

 

However, I'm also interested in knowing more about your use case -- if you tell me more about why you are trying to search your format field like this, perhaps there is a better/simpler solution to the higher-level problem.

 

- Demian

 


From: Uwe Reh <[hidden email]>
Sent: Tuesday, March 28, 2017 6:37 AM
To: [hidden email]
Subject: Re: [VuFind-General] searchspecs.yaml for dummies?

 

Hi Xavier,

sorry, I don't know a “dummies guide” to searchspecs.yaml.
But your problem isn't located in this file. Internally, the field
'format' is handled as a phrase (which respects cases).

Maybe you can use 'allfields_unstemmed', which contains also the content
of format. This field is indexed as words in lower case.

> format:
>    QueryFields:
>     allfields_unstemmed:
>       - [and, 50]
>       - [onephrase, ~]
The disadvantage of this hack is, you may get wrong hits.

A better solution could be, changing the way how Solr handles the field
'format' (../biblio/conf/schema.xml).
You have just to change the line ...
> from <field name="format" type="string" indexed="true" stored="true" multiValued="true"/>
> to   <field name="format" type="textProper" indexed="true" stored="true" multiValued="true"/>
... and reindex your system.
The disadvantage of this method is, beside the need of reindexing, that
you need to take care on schema.xml, while you upgrade VuFind.

Uwe






You have tree places that are important for your search results:
1. The import (SolrMarc) only data that is read into the index could be
found. ;-)
2. The query generation (searchspecs.yaml), which translates the patrons
input to an solr query.
3. The definition of the search fields. Which defines how the Date will
be stored/indexed (schema.xml)

Your actual situation is:
1. The 'format' is imported with lower and upper cases
2. the 'format' is stored and indexed

Am 28.03.2017 um 10:51 schrieb Library:
> Hello,
>
>
>
> We have observed that when searching our ‘format’ field it only
> retrieves successfully case sensitive phrases.
>
>
>
> For example:
>
>
>
> “theses” is not retrieved.
>
> “Theses” is retrieved.
>
>
>
> “memorandum reports” is not retrieved
>
> “Memorandum Reports” is retrieved.
>
>
>
> “Memorandum” or “memorandum” does not retrieve anything.
>
>
>
> The field definition in searchspecs.yaml is:
>
>
>
> format:
>
>    QueryFields:
>
>     format:
>
>       - [and, 50]
>
>       - [onephrase, ~]
>
>
>
>
>
> We would like the field to retrieve case-insensitive either words or
> phrases. What should be the correct definition? BTW, is there any
> “dummies guide” to searchspecs.yaml that explains the meaning of the
> syntax used? Thanks in advance.
>
>
>
> Best regards,
>
>
>
> ---------------------------------
>
> Xavier Berdaguer
>
> Information Specialist
>
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsdm.link%2Fslashdot&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=yWrHNDkWBcpwACRl4lTwkz8YvsvwlN6hgNSP8tQL0Mg%3D&reserved=0
>
>
>
> _______________________________________________
> VuFind-General mailing list
> [hidden email]
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fvufind-general&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=E19y4Eij%2FTglxjWN6wa5339qtQQCOk4p6xrFe1iuzeY%3D&reserved=0
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsdm.link%2Fslashdot&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=yWrHNDkWBcpwACRl4lTwkz8YvsvwlN6hgNSP8tQL0Mg%3D&reserved=0
_______________________________________________
VuFind-General mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fvufind-general&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cee5e747a04dc40b23acb08d475c69a5d%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636262943233095395&sdata=E19y4Eij%2FTglxjWN6wa5339qtQQCOk4p6xrFe1iuzeY%3D&reserved=0


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
VuFind-General mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-general
Loading...