(sorry I have so many questions - it's a side effect of coding ...)
I've been thinking about title vs. titleStr. title -- tokenized, used for searching, displayed in search results titleStr -- not tokenized, used for searching, used to sort by title Am I missing the obvious? Why are we searching the same data twice? That is, why does the query formula include terms both for title and titleStr? Both are used in default and in fielded title queries. Why is title used for "getMoreLikeThis" but titleStr used for "did you mean" suggestion? Why are we displaying title instead of titleStr? (e.g. in the Search results) Of course, we have other copy fields that would have the same questions applied (e.g. author, topic ...) Naomi Dushay [hidden email] ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Vufind-tech mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/vufind-tech |
Naomi - We use both title and titleStr for searching since the title field is stemmed. We can do better relevancy ranking by using both fields to search on. Exact matches will work better with titleStr.
So to answer your question, we use the non-stemmed "string" fields for exact matching and wildcards and the "text" field for the use of stemming and character normalization, etc. Does this answer your question? Andrew ________________________________________ From: [hidden email] [[hidden email]] On Behalf Of Naomi Dushay [[hidden email]] Sent: Thursday, July 31, 2008 7:51 PM To: [hidden email] Subject: [VuFind-Tech] use of copyfields: title vs. titleStr (sorry I have so many questions - it's a side effect of coding ...) I've been thinking about title vs. titleStr. title -- tokenized, used for searching, displayed in search results titleStr -- not tokenized, used for searching, used to sort by title Am I missing the obvious? Why are we searching the same data twice? That is, why does the query formula include terms both for title and titleStr? Both are used in default and in fielded title queries. Why is title used for "getMoreLikeThis" but titleStr used for "did you mean" suggestion? Why are we displaying title instead of titleStr? (e.g. in the Search results) Of course, we have other copy fields that would have the same questions applied (e.g. author, topic ...) Naomi Dushay [hidden email] ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Vufind-tech mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/vufind-tech ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Vufind-tech mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/vufind-tech |
On Aug 4, 2008, at 5:54 AM, Andrew Nagy wrote: >> -----Original Message----- >> From: Naomi Dushay [mailto:[hidden email]] >> Sent: Friday, August 01, 2008 5:34 PM >> To: Andrew Nagy >> >> (snip) >> Thinking about: solrmarc field option to preserve the order in which >> subfields are encountered in the resulting field value. This could >> address some of the issues with the subject display, and possibly >> improve the more complex titles as well. Sneaking a look at the latest solrmarc code, it looks like this is *nearly* implemented. I'm not sure how quickly I'll have cycles to do this, as I just regenned our index and it seems to have broken some stuff in the UI, but it will be a high priority for us to fix the title and subject displays of subdivided marc data in VuFind. This "ordered" value would be for display - an untokenized string field, preserving punctuation and the like. Stored, but not indexed, potentially. Then we'd have the indexed version, tokenized, stemmed, etc. I think. Naomi Dushay [hidden email] ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Vufind-tech mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/vufind-tech |
Free forum by Nabble | Edit this page |