VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Leila Gonzales

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Uwe Reh
Hi,

if I got you right, the problem seems to be well located:

Am 23.03.2017 um 05:09 schrieb Leila Gonzales:
> ...
> The errors I am getting in the SolrAdmin UI logs are:
> org.apache.solr.common.SolrException: Exception writing document id
> 143179 to the index; possible analysis error.
> ...
> Caused by: org.apache.lucene.store.AlreadyClosedException: this
> IndexWriter is closed
> ...
> Caused by: java.lang.OutOfMemoryError: Java heap space

Increasing the heap for Solr should be the generic way, to solve the
problem.

?? Well ??
2Gb Heap should be more than enough for 27,808 records.
If the initial index was empty you should provide the full logs.
> Backing up /usr/local/vufind/solr/vufind/logs/solr.log
> Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

If the index contains already ~1M documents it could be a good idea, to
be generous by offering your Solr 6Gb or more. The UI of Solr gives you
brief information about the usage of the heap. Later on, you may
downsize the heap to an reasonable value.
Tuning the garbage collection (GC_TUNE) should be your last option.
Upgrading the JVM to Java8 may be more effective.

Uwe


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Demian Katz
In reply to this post by Leila Gonzales

Leila,


I'm copying this to solrmarc-tech in case Bob has anything to add.


The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.


It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:


https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options


You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.


EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...


(Note that the values in that example aren't a suggestion, just an example).


I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).


Good luck! Let us know if you need more help!


- Demian


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors
 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

>From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Leila Gonzales

Thank you very much Uwe and Demian. I did some more troubleshooting this morning and turned off indexing of the geographic search/display fields, re-ran the import, and Solr indexed everything fine in a few seconds. I’m going to see if the issue is with the geographic indexing code or if it’s with one of my records in the batch that’s causing the issue. I am suspecting (and hoping) that it’s the latter.

 

Cheers,
Leila

 

 

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 5:07 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Demian Katz

It may pay off to convert the BeanShell into pure Java to take advantage of the new SolrMarc. Let me know if I can be of any help with this process. There are some notes here:


https://vufind.org/wiki/indexing:solrmarc:custom_java_best_practices


As always, I'm happy to elaborate as needed.


- Demian



From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 10:31 AM
To: Demian Katz; [hidden email]
Cc: [hidden email]
Subject: RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors
 

Thank you very much Uwe and Demian. I did some more troubleshooting this morning and turned off indexing of the geographic search/display fields, re-ran the import, and Solr indexed everything fine in a few seconds. I’m going to see if the issue is with the geographic indexing code or if it’s with one of my records in the batch that’s causing the issue. I am suspecting (and hoping) that it’s the latter.

 

Cheers,
Leila

 

 

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 5:07 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Leila Gonzales

Hi Demian,

 

To call the java code from the marc_local.properties, would I use the same syntax (script(NAME.java),method), and would I put the .java code in the index_scripts directory?

 

Thanks!
Leila

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 7:44 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

It may pay off to convert the BeanShell into pure Java to take advantage of the new SolrMarc. Let me know if I can be of any help with this process. There are some notes here:

 

https://vufind.org/wiki/indexing:solrmarc:custom_java_best_practices

 

As always, I'm happy to elaborate as needed.

 

- Demian

 


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 10:31 AM
To: Demian Katz; [hidden email]
Cc: [hidden email]
Subject: RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Thank you very much Uwe and Demian. I did some more troubleshooting this morning and turned off indexing of the geographic search/display fields, re-ran the import, and Solr indexed everything fine in a few seconds. I’m going to see if the issue is with the geographic indexing code or if it’s with one of my records in the batch that’s causing the issue. I am suspecting (and hoping) that it’s the latter.

 

Cheers,
Leila

 

 

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 5:07 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Demian Katz

Actually, you use syntax like you would for built-in custom methods:


custom, method


You don't have to specify which class it is in as long as each method name is completely unique.


- Demian



From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 10:57 AM
To: Demian Katz; [hidden email]
Cc: [hidden email]
Subject: RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors
 

Hi Demian,

 

To call the java code from the marc_local.properties, would I use the same syntax (script(NAME.java),method), and would I put the .java code in the index_scripts directory?

 

Thanks!
Leila

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 7:44 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

It may pay off to convert the BeanShell into pure Java to take advantage of the new SolrMarc. Let me know if I can be of any help with this process. There are some notes here:

 

https://vufind.org/wiki/indexing:solrmarc:custom_java_best_practices

 

As always, I'm happy to elaborate as needed.

 

- Demian

 


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 10:31 AM
To: Demian Katz; [hidden email]
Cc: [hidden email]
Subject: RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Thank you very much Uwe and Demian. I did some more troubleshooting this morning and turned off indexing of the geographic search/display fields, re-ran the import, and Solr indexed everything fine in a few seconds. I’m going to see if the issue is with the geographic indexing code or if it’s with one of my records in the batch that’s causing the issue. I am suspecting (and hoping) that it’s the latter.

 

Cheers,
Leila

 

 

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 5:07 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: [solrmarc-tech] Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Leila Gonzales

So I could basically copy the methods out of the VuFindIndexer.java file, rename them and in a new .java file (say, “Coordinate.java” ), and put that file in vufind/import/index_java/src/org/solrmarc/index/ directory. As long as the Coordinate.java file extends the SolrIndexer, I should be good to go, correct?

 

Thanks for the help. I really appreciate it.

Leila

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Demian Katz
Sent: Thursday, March 23, 2017 8:01 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Actually, you use syntax like you would for built-in custom methods:

 

custom, method

 

You don't have to specify which class it is in as long as each method name is completely unique.

 

- Demian

 


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 10:57 AM
To: Demian Katz; [hidden email]
Cc: [hidden email]
Subject: RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi Demian,

 

To call the java code from the marc_local.properties, would I use the same syntax (script(NAME.java),method), and would I put the .java code in the index_scripts directory?

 

Thanks!
Leila

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 7:44 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

It may pay off to convert the BeanShell into pure Java to take advantage of the new SolrMarc. Let me know if I can be of any help with this process. There are some notes here:

 

https://vufind.org/wiki/indexing:solrmarc:custom_java_best_practices

 

As always, I'm happy to elaborate as needed.

 

- Demian

 


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 10:31 AM
To: Demian Katz; [hidden email]
Cc: [hidden email]
Subject: RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Thank you very much Uwe and Demian. I did some more troubleshooting this morning and turned off indexing of the geographic search/display fields, re-ran the import, and Solr indexed everything fine in a few seconds. I’m going to see if the issue is with the geographic indexing code or if it’s with one of my records in the batch that’s causing the issue. I am suspecting (and hoping) that it’s the latter.

 

Cheers,
Leila

 

 

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 5:07 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: [solrmarc-tech] Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Demian Katz

Leila,


You might find it helpful to look at how I have refactored the VuFindIndexer code for VuFind 4.0 -- instead of extending the SolrIndexer, I just created a bunch of stand-alone classes which access an instance of the indexer through a singleton pattern as needed. You should be able to use any of the classes here as a starting point example:


https://github.com/vufind-org/vufind/tree/master/import/index_java/src/org/vufind/index


If you're not sure how to do any particular thing, let me know and I can point you to a more specific example. I suspect that comparing my version of your geo code to the beanshell version should be pretty enlightening.


- Demian




From: [hidden email] <[hidden email]> on behalf of Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 11:06 AM
To: [hidden email]; [hidden email]
Subject: RE: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors
 

So I could basically copy the methods out of the VuFindIndexer.java file, rename them and in a new .java file (say, “Coordinate.java” ), and put that file in vufind/import/index_java/src/org/solrmarc/index/ directory. As long as the Coordinate.java file extends the SolrIndexer, I should be good to go, correct?

 

Thanks for the help. I really appreciate it.

Leila

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Demian Katz
Sent: Thursday, March 23, 2017 8:01 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Actually, you use syntax like you would for built-in custom methods:

 

custom, method

 

You don't have to specify which class it is in as long as each method name is completely unique.

 

- Demian

 


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 10:57 AM
To: Demian Katz; [hidden email]
Cc: [hidden email]
Subject: RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi Demian,

 

To call the java code from the marc_local.properties, would I use the same syntax (script(NAME.java),method), and would I put the .java code in the index_scripts directory?

 

Thanks!
Leila

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 7:44 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

It may pay off to convert the BeanShell into pure Java to take advantage of the new SolrMarc. Let me know if I can be of any help with this process. There are some notes here:

 

https://vufind.org/wiki/indexing:solrmarc:custom_java_best_practices

 

As always, I'm happy to elaborate as needed.

 

- Demian

 


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 10:31 AM
To: Demian Katz; [hidden email]
Cc: [hidden email]
Subject: RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Thank you very much Uwe and Demian. I did some more troubleshooting this morning and turned off indexing of the geographic search/display fields, re-ran the import, and Solr indexed everything fine in a few seconds. I’m going to see if the issue is with the geographic indexing code or if it’s with one of my records in the batch that’s causing the issue. I am suspecting (and hoping) that it’s the latter.

 

Cheers,
Leila

 

 

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 5:07 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: [solrmarc-tech] Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Leila Gonzales

Thank you so much Demian!

Leila

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Demian Katz
Sent: Thursday, March 23, 2017 8:57 AM
To: [hidden email]; [hidden email]
Subject: Re: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Leila,

 

You might find it helpful to look at how I have refactored the VuFindIndexer code for VuFind 4.0 -- instead of extending the SolrIndexer, I just created a bunch of stand-alone classes which access an instance of the indexer through a singleton pattern as needed. You should be able to use any of the classes here as a starting point example:

 

https://github.com/vufind-org/vufind/tree/master/import/index_java/src/org/vufind/index

 

If you're not sure how to do any particular thing, let me know and I can point you to a more specific example. I suspect that comparing my version of your geo code to the beanshell version should be pretty enlightening.

 

- Demian

 


From: [hidden email] <[hidden email]> on behalf of Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 11:06 AM
To: [hidden email]; [hidden email]
Subject: RE: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

So I could basically copy the methods out of the VuFindIndexer.java file, rename them and in a new .java file (say, “Coordinate.java” ), and put that file in vufind/import/index_java/src/org/solrmarc/index/ directory. As long as the Coordinate.java file extends the SolrIndexer, I should be good to go, correct?

 

Thanks for the help. I really appreciate it.

Leila

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Demian Katz
Sent: Thursday, March 23, 2017 8:01 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Actually, you use syntax like you would for built-in custom methods:

 

custom, method

 

You don't have to specify which class it is in as long as each method name is completely unique.

 

- Demian

 


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 10:57 AM
To: Demian Katz; [hidden email]
Cc: [hidden email]
Subject: RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi Demian,

 

To call the java code from the marc_local.properties, would I use the same syntax (script(NAME.java),method), and would I put the .java code in the index_scripts directory?

 

Thanks!
Leila

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 7:44 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

It may pay off to convert the BeanShell into pure Java to take advantage of the new SolrMarc. Let me know if I can be of any help with this process. There are some notes here:

 

https://vufind.org/wiki/indexing:solrmarc:custom_java_best_practices

 

As always, I'm happy to elaborate as needed.

 

- Demian

 


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 10:31 AM
To: Demian Katz; [hidden email]
Cc: [hidden email]
Subject: RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Thank you very much Uwe and Demian. I did some more troubleshooting this morning and turned off indexing of the geographic search/display fields, re-ran the import, and Solr indexed everything fine in a few seconds. I’m going to see if the issue is with the geographic indexing code or if it’s with one of my records in the batch that’s causing the issue. I am suspecting (and hoping) that it’s the latter.

 

Cheers,
Leila

 

 

 

From: Demian Katz [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 5:07 AM
To: Leila Gonzales; [hidden email]
Cc: [hidden email]
Subject: Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales <[hidden email]>
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: [solrmarc-tech] Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Robert Haschart
In reply to this post by Demian Katz
Weighing in on this part of Demian's message.
I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).
The number that is sent in a chunk is 640 records.   You can set the system property that controls this value on the command line thusly:

-Dsolrmarc.indexer.chunksize=100

that system property (and two others that weren't documented) are now on the Wiki page Demian pointed to.

-Bob Haschart


On 3/23/2017 8:07 AM, Demian Katz wrote:

Leila,


I'm copying this to solrmarc-tech in case Bob has anything to add.


The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.


It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:


https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options


You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.


EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...


(Note that the values in that example aren't a suggestion, just an example).


I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).


Good luck! Let us know if you need more help!


- Demian


From: Leila Gonzales [hidden email]
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors
 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: [solrmarc-tech] Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Leila Gonzales

Thanks Bob. I’ve done some more troubleshooting and it looks like the issue is probably due to the indexing of spatial coordinates. When I don’t index the geo fields, the import runs just fine.

 

It appears there were some changes between Solr 4.2 and 5.5, and I may have to update the coordinate indexing code for VuFind.  I’m looking into that some more and am hunting down the records with the coordinates that are probably causing the issue. I did find this thread which may prove useful: http://lucene.472066.n3.nabble.com/Spatial-Dataimport-full-import-results-in-OutOfMemory-for-a-rectangle-defining-a-line-tp4034928p4035372.html

 

This is the first error I see in the solr.log prior to the “org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.” errors is :

 

2017-03-23 16:25:28.380 INFO  (qtp2082400824-13) [   x:biblio] o.a.s.u.p.LogUpdateProcessorFactory [biblio]  webapp=/solr path=/update params={wt=javabin&version=2}{add=[73231 (1562677701975736320), 73233 (1562677701979930624), 73234 (1562677701986222080), 73322 (1562677702001950720), 73323 (1562677702007193600), 73377 (1562677702014533632), 73380 (1562677702022922241), 73381 (1562677702025019392), 73382 (1562677702028165120), 73497 (1562677702030262272), ... (97 adds)]} 0 739185

2017-03-23 16:25:30.197 ERROR (qtp2082400824-13) [   x:biblio] o.a.s.s.HttpSolrCall null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space

        at org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:604)

        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)

        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)

        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)

        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)

        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)

        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)

        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)

        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)

        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)

        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)

        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)

        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)

        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)

        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)

        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

        at org.eclipse.jetty.server.Server.handle(Server.java:499)

        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)

        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)

        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)

        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)

        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.OutOfMemoryError: Java heap space

        at org.apache.lucene.spatial.prefix.tree.GeohashPrefixTree.stringToBytesPlus1(GeohashPrefixTree.java:92)

        at org.apache.lucene.spatial.prefix.tree.GeohashPrefixTree.access$000(GeohashPrefixTree.java:37)

        at org.apache.lucene.spatial.prefix.tree.GeohashPrefixTree$GhCell.<init>(GeohashPrefixTree.java:104)

        at org.apache.lucene.spatial.prefix.tree.GeohashPrefixTree$GhCell.getSubCells(GeohashPrefixTree.java:131)

        at org.apache.lucene.spatial.prefix.tree.LegacyCell.getNextLevelCells(LegacyCell.java:141)

        at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:150)

        at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153)

        at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153)

        at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153)

        at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153)

        at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153)

        at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153)

        at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153)

        at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153)

        at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.createCellIteratorToIndex(RecursivePrefixTreeStrategy.java:128)

        at org.apache.lucene.spatial.prefix.PrefixTreeStrategy.createIndexableFields(PrefixTreeStrategy.java:151)

        at org.apache.lucene.spatial.prefix.PrefixTreeStrategy.createIndexableFields(PrefixTreeStrategy.java:146)

        at org.apache.lucene.spatial.prefix.PrefixTreeStrategy.createIndexableFields(PrefixTreeStrategy.java:137)

        at org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:211)

        at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:47)

        at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:122)

        at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82)

        at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:280)

        at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)

        at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)

        at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)

        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

        at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)

        at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1086)

        at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:709)

        at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)

        at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)

 

- Leila

 

 

From: Robert Haschart [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 11:06 AM
To: [hidden email]; Leila Gonzales; [hidden email]
Subject: Re: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Weighing in on this part of Demian's message.

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

The number that is sent in a chunk is 640 records.   You can set the system property that controls this value on the command line thusly:

-Dsolrmarc.indexer.chunksize=100

that system property (and two others that weren't documented) are now on the Wiki page Demian pointed to.

-Bob Haschart

On 3/23/2017 8:07 AM, Demian Katz wrote:

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales [hidden email]
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Leila Gonzales
In reply to this post by Leila Gonzales

Hi everyone,

 

I just wanted to report back that I’ve found that the issue is with two files I have, and not with the location.bsh/indexing routines, so that’s the good news.

 

For some reason, only two of my .mrc files, are having the issue. One is 27,000 records and the other is 93,000 records. However, I am able to index another mrc file with coordinate data that has ~200,000 records and have no issues with it. Furthermore, the indexing problems for these two files only occur when I try to index the coordinate field. I’ve checked the data in the coordinate fields, but there is nothing special in terms of odd coordinate pairs or typos, etc. All of the coordinate combinations we use in these files have been successfully indexed in other files on VuFind 3.1.3.

 

Also, indexing goes fine. It’s when the records are sent to Solr for the commit stage is when the out of memory errors happen. There also is no consistent failure point… the file stops indexing at different sets of records each time, so that makes it very difficult to say which record or set of records is causing the issue.

 

I’ve also tried the following – all to no avail:

1.       Change the -Dsolrmarc.indexer.chunksize option – tried 1, 5, 50, 100, 500

2.       Change the autoCommit time to 60 sec

3.       Change the number of threads -Dsolrmarc.solrj.threadcount=8

4.       Upgraded JVM/Java to 8 (java version "1.8.0_121": Java(TM) SE Runtime Environment (build 1.8.0_121-b13) – the issue here is that I can’t get Solr to recognize the Java upgrade so it still points to the Java 7 instance.

 

The error I’m consistently getting is (except that the document id is never the same one!):

2017-03-25 07:49:05.143 ERROR (qtp1385340628-14) [   x:biblio] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception writing document id 120336 to the index; possible analysis error.

….

Caused by: java.lang.OutOfMemoryError: Java heap space

       at org.apache.lucene.util.fst.BytesStore.writeByte(BytesStore.java:91)

       at org.apache.lucene.util.fst.FST.<init>(FST.java:295)

       at org.apache.lucene.util.fst.Builder.<init>(Builder.java:172)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock.compileIndex(BlockTreeTermsWriter.java:594)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:775)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:1085)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:1046)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:456)

       at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:198)

       at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:107)

       at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:126)

       at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:422)

       at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:503)

       at org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:357)

       at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:436)

       at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477)

       at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282)

       at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)

       at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)

       at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)

       at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

       at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)

       at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1086)

       at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:709)

       at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)

       at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)

       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)

       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)

       at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:260)

       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)

       at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)

       at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:145)

 

Uwe, you mentioned sending along the solr.log and solr_gc.log. I’m happy to send that to you off-list if you would have a chance to look at them.

 

Thanks again everyone for any help you can provide or suggestions for where I should look next.

 

Kind regards,
Leila

From: Robert Haschart [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 11:06 AM
To: [hidden email]; Leila Gonzales; [hidden email]
Subject: Re: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Weighing in on this part of Demian's message.

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

The number that is sent in a chunk is 640 records.   You can set the system property that controls this value on the command line thusly:

-Dsolrmarc.indexer.chunksize=100

that system property (and two others that weren't documented) are now on the Wiki page Demian pointed to.

-Bob Haschart

On 3/23/2017 8:07 AM, Demian Katz wrote:

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales [hidden email]
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: [solrmarc-tech] RE: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Demian Katz

Can you easily split the offending files into chunks? I'd be interested to see if there is a particular chunk size that always works for this records, or if by splitting the files you can narrow down to a particular run of records that are related to the problem. I realize that the facts that the ID is always different and that larger files work correctly argue against a particular offending record and a particular size limit, but I still think the chunking approach might provide some additional clues....


- Demian




From: [hidden email] <[hidden email]> on behalf of Leila Gonzales <[hidden email]>
Sent: Saturday, March 25, 2017 4:25 AM
To: [hidden email]; [hidden email]
Subject: [solrmarc-tech] RE: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors
 

Hi everyone,

 

I just wanted to report back that I’ve found that the issue is with two files I have, and not with the location.bsh/indexing routines, so that’s the good news.

 

For some reason, only two of my .mrc files, are having the issue. One is 27,000 records and the other is 93,000 records. However, I am able to index another mrc file with coordinate data that has ~200,000 records and have no issues with it. Furthermore, the indexing problems for these two files only occur when I try to index the coordinate field. I’ve checked the data in the coordinate fields, but there is nothing special in terms of odd coordinate pairs or typos, etc. All of the coordinate combinations we use in these files have been successfully indexed in other files on VuFind 3.1.3.

 

Also, indexing goes fine. It’s when the records are sent to Solr for the commit stage is when the out of memory errors happen. There also is no consistent failure point… the file stops indexing at different sets of records each time, so that makes it very difficult to say which record or set of records is causing the issue.

 

I’ve also tried the following – all to no avail:

1.       Change the -Dsolrmarc.indexer.chunksize option – tried 1, 5, 50, 100, 500

2.       Change the autoCommit time to 60 sec

3.       Change the number of threads -Dsolrmarc.solrj.threadcount=8

4.       Upgraded JVM/Java to 8 (java version "1.8.0_121": Java(TM) SE Runtime Environment (build 1.8.0_121-b13) – the issue here is that I can’t get Solr to recognize the Java upgrade so it still points to the Java 7 instance.

 

The error I’m consistently getting is (except that the document id is never the same one!):

2017-03-25 07:49:05.143 ERROR (qtp1385340628-14) [   x:biblio] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception writing document id 120336 to the index; possible analysis error.

….

Caused by: java.lang.OutOfMemoryError: Java heap space

       at org.apache.lucene.util.fst.BytesStore.writeByte(BytesStore.java:91)

       at org.apache.lucene.util.fst.FST.<init>(FST.java:295)

       at org.apache.lucene.util.fst.Builder.<init>(Builder.java:172)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock.compileIndex(BlockTreeTermsWriter.java:594)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:775)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:1085)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:1046)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:456)

       at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:198)

       at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:107)

       at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:126)

       at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:422)

       at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:503)

       at org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:357)

       at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:436)

       at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477)

       at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282)

       at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)

       at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)

       at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)

       at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

       at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)

       at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1086)

       at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:709)

       at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)

       at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)

       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)

       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)

       at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:260)

       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)

       at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)

       at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:145)

 

Uwe, you mentioned sending along the solr.log and solr_gc.log. I’m happy to send that to you off-list if you would have a chance to look at them.

 

Thanks again everyone for any help you can provide or suggestions for where I should look next.

 

Kind regards,
Leila

From: Robert Haschart [mailto:[hidden email]]
Sent: Thursday, March 23, 2017 11:06 AM
To: [hidden email]; Leila Gonzales; [hidden email]
Subject: Re: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Weighing in on this part of Demian's message.

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

The number that is sent in a chunk is 640 records.   You can set the system property that controls this value on the command line thusly:

-Dsolrmarc.indexer.chunksize=100

that system property (and two others that weren't documented) are now on the Wiki page Demian pointed to.

-Bob Haschart

On 3/23/2017 8:07 AM, Demian Katz wrote:

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales [hidden email]
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

 

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: [solrmarc-tech] RE: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Tod Olson
Hi Leila,

On the Java version, in the very first message $JAVA is set to /usr/lib/jvm/default-java/bin/java. I would guess that one of those directories is a symlink to a specific installed JVM distribution, so you might "ls -ld" each level. There should be a way to change the default through whatever package manager controls your Java installation. Without having to chase that down, you could try explicitly setting JAVA_HOME to the base directory of your Java 1.8 distro and see if that makes a difference.

Best,

-Tod

On Mar 25, 2017, at 7:27 AM, Demian Katz <[hidden email]> wrote:

Can you easily split the offending files into chunks? I'd be interested to see if there is a particular chunk size that always works for this records, or if by splitting the files you can narrow down to a particular run of records that are related to the problem. I realize that the facts that the ID is always different and that larger files work correctly argue against a particular offending record and a particular size limit, but I still think the chunking approach might provide some additional clues....

- Demian



From: [hidden email] <[hidden email]> on behalf of Leila Gonzales <[hidden email]>
Sent: Saturday, March 25, 2017 4:25 AM
To: [hidden email]; [hidden email]
Subject: [solrmarc-tech] RE: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors
 
Hi everyone,

 

I just wanted to report back that I’ve found that the issue is with two files I have, and not with the location.bsh/indexing routines, so that’s the good news.

 

For some reason, only two of my .mrc files, are having the issue. One is 27,000 records and the other is 93,000 records. However, I am able to index another mrc file with coordinate data that has ~200,000 records and have no issues with it. Furthermore, the indexing problems for these two files only occur when I try to index the coordinate field. I’ve checked the data in the coordinate fields, but there is nothing special in terms of odd coordinate pairs or typos, etc. All of the coordinate combinations we use in these files have been successfully indexed in other files on VuFind 3.1.3.

 

Also, indexing goes fine. It’s when the records are sent to Solr for the commit stage is when the out of memory errors happen. There also is no consistent failure point… the file stops indexing at different sets of records each time, so that makes it very difficult to say which record or set of records is causing the issue.

 

I’ve also tried the following – all to no avail:
1.       Change the -Dsolrmarc.indexer.chunksize option – tried 1, 5, 50, 100, 500
2.       Change the autoCommit time to 60 sec
3.       Change the number of threads -Dsolrmarc.solrj.threadcount=8
4.       Upgraded JVM/Java to 8 (java version "1.8.0_121": Java(TM) SE Runtime Environment (build 1.8.0_121-b13) – the issue here is that I can’t get Solr to recognize the Java upgrade so it still points to the Java 7 instance. 

 

The error I’m consistently getting is (except that the document id is never the same one!):
2017-03-25 07:49:05.143 ERROR (qtp1385340628-14) [   x:biblio] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception writing document id 120336 to the index; possible analysis error.
….
Caused by: java.lang.OutOfMemoryError: Java heap space
       at org.apache.lucene.util.fst.BytesStore.writeByte(BytesStore.java:91)
       at org.apache.lucene.util.fst.FST.<init>(FST.java:295)
       at org.apache.lucene.util.fst.Builder.<init>(Builder.java:172)
       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock.compileIndex(BlockTreeTermsWriter.java:594)
       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:775)
       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:1085)
       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:1046)
       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:456)
       at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:198)
       at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:107)
       at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:126)
       at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:422)
       at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:503)
       at org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:357)
       at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:436)
       at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477)
       at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282)
       at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)
       at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)
       at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)
       at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
       at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
       at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1086)
       at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:709)
       at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
       at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
       at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:260)
       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
       at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
       at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:145)

 

Uwe, you mentioned sending along the solr.log and solr_gc.log. I’m happy to send that to you off-list if you would have a chance to look at them.

 

Thanks again everyone for any help you can provide or suggestions for where I should look next.

 

Kind regards,
Leila
From: Robert Haschart [mailto:[hidden email]] 
Sent: Thursday, March 23, 2017 11:06 AM
To: [hidden email]; Leila Gonzales; [hidden email]
Subject: Re: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Weighing in on this part of Demian's message.
I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).
The number that is sent in a chunk is 640 records.   You can set the system property that controls this value on the command line thusly:

-Dsolrmarc.indexer.chunksize=100

that system property (and two others that weren't documented) are now on the Wiki page Demian pointed to.

-Bob Haschart

On 3/23/2017 8:07 AM, Demian Katz wrote:
Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian

From: Leila Gonzales [hidden email]
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.
From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.
Now Importing /incoming/processed/ importrecords.mrc ...
Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc
0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc
3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties
9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:
10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc
INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources
INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties
DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient
INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties
INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties
INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties
DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh
DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh
DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh
DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh
DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh
DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh
DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh
DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh
INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]
INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr
ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179
ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:
org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.
...
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
...
Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:
solr.sh:  set SOLR_HEAP to 2G
import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'
(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:
Using Solr root directory: /usr/local/vufind/solr/vendor
Using Java: /usr/lib/jvm/default-java/bin/java
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
Backing up /usr/local/vufind/solr/vufind/logs/solr.log
Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:
    JAVA            = /usr/lib/jvm/default-java/bin/java
    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server
    SOLR_HOME       = /usr/local/vufind/solr/vufind
    SOLR_HOST       =
    SOLR_PORT       = 8080
    STOP_PORT       = 7080
    JAVA_MEM_OPTS   = -Xms2G -Xmx2G
    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80
    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log
   SOLR_TIMEZONE   = UTC
    SOLR_OPTS        = -Xss256k
    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 

-- 
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

 

-- 
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

Leila Gonzales
In reply to this post by Leila Gonzales

Thank you Demian and Tod. I was able to finally figure out the issue, and I just want to document this in case anyone else runs into the issue before I submit the forthcoming PR.

 

The problem was that our files had coordinate pairs for which one was on the South pole (S900000) and the other coordinate was within 5 minutes of the South pole (i.e. S895900). (For some reason, there is no issue with the North pole.) Another issue that failed with the Solr indexing was where we had E001 and W000 in the west and east coordinates.  From what I can tell from http://lucene.472066.n3.nabble.com/Spatial-Dataimport-full-import-results-in-OutOfMemory-for-a-rectangle-defining-a-line-td4034928.html#a4035372, the problem seems to be that Solr runs out of memory trying to create too many spatial grids – or something to that effect.

 

I’ll be submitting a pull request to trap for these cases in the getAllCoordinates routine (in location.bsh and VuFindIndexer.java) so that the coordinates don't get processed for these cases, and an error message is produced during indexing so that the user can fix the records.

 

I also found some other minor bugs in the validateCoordinate routine (Solr supports longitudinal wrapping so we don't have to check for West > East anymore), and also in the map_tab_ol.js - coordinates that crossed the dateline crossing were not being displayed properly.

 

Cheers,

Leila

 

From: Tod Olson [mailto:[hidden email]]
Sent: Saturday, March 25, 2017 10:43 AM
To: Demian Katz
Cc: [hidden email]; [hidden email]
Subject: Re: [VuFind-Tech] [solrmarc-tech] RE: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi Leila,

 

On the Java version, in the very first message $JAVA is set to /usr/lib/jvm/default-java/bin/java. I would guess that one of those directories is a symlink to a specific installed JVM distribution, so you might "ls -ld" each level. There should be a way to change the default through whatever package manager controls your Java installation. Without having to chase that down, you could try explicitly setting JAVA_HOME to the base directory of your Java 1.8 distro and see if that makes a difference.

 

Best,

 

-Tod

 

On Mar 25, 2017, at 7:27 AM, Demian Katz <[hidden email]> wrote:

 

Can you easily split the offending files into chunks? I'd be interested to see if there is a particular chunk size that always works for this records, or if by splitting the files you can narrow down to a particular run of records that are related to the problem. I realize that the facts that the ID is always different and that larger files work correctly argue against a particular offending record and a particular size limit, but I still think the chunking approach might provide some additional clues....

 

- Demian

 


From: [hidden email] <[hidden email]> on behalf of Leila Gonzales <[hidden email]>
Sent: Saturday, March 25, 2017 4:25 AM
To: [hidden email]; [hidden email]
Subject: [solrmarc-tech] RE: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi everyone,

 

I just wanted to report back that I’ve found that the issue is with two files I have, and not with the location.bsh/indexing routines, so that’s the good news.

 

For some reason, only two of my .mrc files, are having the issue. One is 27,000 records and the other is 93,000 records. However, I am able to index another mrc file with coordinate data that has ~200,000 records and have no issues with it. Furthermore, the indexing problems for these two files only occur when I try to index the coordinate field. I’ve checked the data in the coordinate fields, but there is nothing special in terms of odd coordinate pairs or typos, etc. All of the coordinate combinations we use in these files have been successfully indexed in other files on VuFind 3.1.3.

 

Also, indexing goes fine. It’s when the records are sent to Solr for the commit stage is when the out of memory errors happen. There also is no consistent failure point… the file stops indexing at different sets of records each time, so that makes it very difficult to say which record or set of records is causing the issue.

 

I’ve also tried the following – all to no avail:

1.       Change the -Dsolrmarc.indexer.chunksize option – tried 1, 5, 50, 100, 500

2.       Change the autoCommit time to 60 sec

3.       Change the number of threads -Dsolrmarc.solrj.threadcount=8

4.       Upgraded JVM/Java to 8 (java version "1.8.0_121": Java(TM) SE Runtime Environment (build 1.8.0_121-b13) – the issue here is that I can’t get Solr to recognize the Java upgrade so it still points to the Java 7 instance. 

 

The error I’m consistently getting is (except that the document id is never the same one!):

2017-03-25 07:49:05.143 ERROR (qtp1385340628-14) [   x:biblio] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception writing document id 120336 to the index; possible analysis error.

….

Caused by: java.lang.OutOfMemoryError: Java heap space

       at org.apache.lucene.util.fst.BytesStore.writeByte(BytesStore.java:91)

       at org.apache.lucene.util.fst.FST.<init>(FST.java:295)

       at org.apache.lucene.util.fst.Builder.<init>(Builder.java:172)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock.compileIndex(BlockTreeTermsWriter.java:594)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:775)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:1085)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:1046)

       at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:456)

       at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:198)

       at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:107)

       at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:126)

       at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:422)

       at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:503)

       at org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:357)

       at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:436)

       at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477)

       at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282)

       at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)

       at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)

       at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)

       at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

       at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)

       at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1086)

       at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:709)

       at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)

       at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)

       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)

       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)

       at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:260)

       at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)

       at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)

       at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:145)

 

Uwe, you mentioned sending along the solr.log and solr_gc.log. I’m happy to send that to you off-list if you would have a chance to look at them.

 

Thanks again everyone for any help you can provide or suggestions for where I should look next.

 

Kind regards,
Leila

From: Robert Haschart [mailto:[hidden email]] 
Sent: Thursday, March 23, 2017 11:06 AM
To: [hidden email]; Leila Gonzales; [hidden email]
Subject: Re: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Weighing in on this part of Demian's message.

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

The number that is sent in a chunk is 640 records.   You can set the system property that controls this value on the command line thusly:

-Dsolrmarc.indexer.chunksize=100

that system property (and two others that weren't documented) are now on the Wiki page Demian pointed to.

-Bob Haschart

On 3/23/2017 8:07 AM, Demian Katz wrote:

Leila,

 

I'm copying this to solrmarc-tech in case Bob has anything to add.

 

The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference.

 

It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here:

 

 

You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g.

 

EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ...

 

(Note that the values in that example aren't a suggestion, just an example).

 

I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page).

 

Good luck! Let us know if you need more help!

 

- Demian


From: Leila Gonzales [hidden email]
Sent: Thursday, March 23, 2017 12:09 AM
To: [hidden email]
Subject: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors

 

Hi all,

 

I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3.

From what I can tell, it seems that the import-marc.sh script is dying on the commit stage.

Now Importing /incoming/processed/ importrecords.mrc ...

Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC  -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc

0 [main] DEBUG org.solrmarc.driver.ConfigDriver  - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc

3 [main] DEBUG org.solrmarc.tools.PropertyUtils  - Opening file: /usr/local/vufind/local/import/import.properties

9 [main] INFO org.solrmarc.driver.ConfigDriver  - Effective Command Line is:

10 [main] INFO org.solrmarc.driver.ConfigDriver  -    java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc

INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources

INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties

DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient

INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties

INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh

DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh

INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc]

INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:421) - Failed on single doc with id :       143179

ERROR [SolrUpdateOnError_      143170_      143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error.

 

The errors I am getting in the SolrAdmin UI logs are:

org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.

...

Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

...

Caused by: java.lang.OutOfMemoryError: Java heap space

 

I've tried modifying the following files, but nothing has worked so far:

solr.sh:  set SOLR_HEAP to 2G

import-marc.sh: set   INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0'

(My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. )

 

And my current settings are:

Using Solr root directory: /usr/local/vufind/solr/vendor

Using Java: /usr/lib/jvm/default-java/bin/java

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Backing up /usr/local/vufind/solr/vufind/logs/solr.log

Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log

 

Starting Solr using the following settings:

    JAVA            = /usr/lib/jvm/default-java/bin/java

    SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server

    SOLR_HOME       = /usr/local/vufind/solr/vufind

    SOLR_HOST       =

    SOLR_PORT       = 8080

    STOP_PORT       = 7080

    JAVA_MEM_OPTS   = -Xms2G -Xmx2G

    GC_TUNE         = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80

    GC_LOG_OPTS     = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log

   SOLR_TIMEZONE   = UTC

    SOLR_OPTS        = -Xss256k

    SOLR_ADDL_ARGS   = -Dsolr.log=/usr/local/vufind/solr/vufind/logs

 

Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh? Is there somewhere else I should be looking?

 

Thanks for any guidance you can send my way.

 

Kind regards,
Leila

 

-- 
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

 

-- 
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, 
Slashdot.org! http://sdm.link/slashdot_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech