Splitting the Lucene index

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Splitting the Lucene index

Barnett, Jeffrey
As I mentioned in earlier posts, supporting a multi-million record index seems to get exponentially harder with size.  Would it be practical to divide the records by some method so that a single institution's records were spread across multiple indices, searched and ranked as a whole?  Physically this isn't much different than the federated search, which is already in the "to do" list, but with some planning could be a simpler problem to solve, and would be a performance booster as well.  I have a 6xcpu cluster, but only one at a time can work on a single index.

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: Splitting the Lucene index

wsgrah
Administrator
I think you're describing Solr's distributed search
(http://wiki.apache.org/solr/DistributedSearch). You can do that, or set
up a load balancer and replicate the index across the servers. The 1.3
has a post-commit handler that allows you to call the replication scripts.

Wayne

Barnett, Jeffrey wrote:

> As I mentioned in earlier posts, supporting a multi-million record index seems to get exponentially harder with size.  Would it be practical to divide the records by some method so that a single institution's records were spread across multiple indices, searched and ranked as a whole?  Physically this isn't much different than the federated search, which is already in the "to do" list, but with some planning could be a simpler problem to solve, and would be a performance booster as well.  I have a 6xcpu cluster, but only one at a time can work on a single index.
>
> -------------------------------------------------------------------------
> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
> Studies have shown that voting for your favorite open source project,
> along with a healthy diet, reduces your potential for chronic lameness
> and boredom. Vote Now at http://www.sourceforge.net/community/cca08
> _______________________________________________
> Vufind-tech mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/vufind-tech
>
>  

--
/**
  * Wayne Graham
  * Earl Gregg Swem Library
  * PO Box 8794
  * Williamsburg, VA 23188
  * 757.221.3112
  * http://swem.wm.edu/blogs/waynegraham/
  */



-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech