Quantcast

Large browse indexes: SQLite limits?

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Large browse indexes: SQLite limits?

Tod Olson
Has anyone run into size limits building the browse indexes?

We're having a problem with building our browse indexes, the first one blows up:

+ java -Dfile.encoding=UTF-8 -cp 'browse-indexing-multi-title.jar:../solr/lib/*' CreateBrowseSQLite sorted-title.tmp title_browse.db
Exception in thread "main" java.sql.SQLException: disk I/O error
        at org.sqlite.DB.execute(DB.java:275)
        at org.sqlite.DB.executeUpdate(DB.java:281)
        at org.sqlite.Stmt.executeUpdate(Stmt.java:103)
        at CreateBrowseSQLite.buildOrderedTables(CreateBrowseSQLite.java:115)
        at CreateBrowseSQLite.create(CreateBrowseSQLite.java:139)
        at CreateBrowseSQLite.main(CreateBrowseSQLite.java:154)

The BrowseIndexer code that blows up is:

        stat.executeUpdate ("create table headings " +
                            "as select * from all_headings order by key;");

All of our 8M+ rows got into all_headings, but we blow up creating headings. We're nowhere near filling the disc, so I figure this is maybe some kind of Java or SQLite limitation. The created DB file is about 1.8GB. This sort of feels like a 32bit limit . If this were an out of memory error, I would expect something other than an I/O error. Has anyone run into this?

-Tod
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Mark Triggs-3
Hi Tod,

I tried a similar experiment here--created a 2GB+ table and then cloned
it using 'create table ... as select ...', but that worked as expected.
Are you running on a 32-bit platform?  If so, I might need to repeat my
test on a virtual machine to see what you're seeing.

One thing I did notice: while it was creating the sorted table, SQLite
was writing a pretty large temporary file into /var/tmp, even though my
test database was on a different filesystem.  Is there any chance that
one of your system disks is quietly filling up while the browse build is
happening on your other filesystem?

We definitely had browse indexes bigger than 2GB when I was working at
the NLA, so don't despair just yet :)

Cheers,

Mark



Tod Olson <[hidden email]> writes:

> Has anyone run into size limits building the browse indexes?
>
> We're having a problem with building our browse indexes, the first one blows up:
>
> + java -Dfile.encoding=UTF-8 -cp 'browse-indexing-multi-title.jar:../solr/lib/*' CreateBrowseSQLite sorted-title.tmp title_browse.db
> Exception in thread "main" java.sql.SQLException: disk I/O error
> at org.sqlite.DB.execute(DB.java:275)
> at org.sqlite.DB.executeUpdate(DB.java:281)
> at org.sqlite.Stmt.executeUpdate(Stmt.java:103)
> at CreateBrowseSQLite.buildOrderedTables(CreateBrowseSQLite.java:115)
> at CreateBrowseSQLite.create(CreateBrowseSQLite.java:139)
> at CreateBrowseSQLite.main(CreateBrowseSQLite.java:154)
>
> The BrowseIndexer code that blows up is:
>
>         stat.executeUpdate ("create table headings " +
>                             "as select * from all_headings order by key;");
>
> All of our 8M+ rows got into all_headings, but we blow up creating
> headings. We're nowhere near filling the disc, so I figure this is
> maybe some kind of Java or SQLite limitation. The created DB file is
> about 1.8GB. This sort of feels like a 32bit limit . If this were an
> out of memory error, I would expect something other than an I/O
> error. Has anyone run into this?

--
Mark Triggs
<[hidden email]>

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Tod Olson
Good question about the 32-bit environment. We're running with the
OpenJDK 64-Bit Server VM, though maybe I should give it the -d64
switch, just to be safe.

But this also has me thinking about the import process. Since we have
a sorted file, it seems like we should be able to bulk import into
SQLite and not have to create headings from all_headings. The .import
command might work, if it preserves the order of the data.

-Tod


On Nov 1, 2012, at 6:20 AM, Mark Triggs <[hidden email]>
 wrote:

> Hi Tod,
>
> I tried a similar experiment here--created a 2GB+ table and then cloned
> it using 'create table ... as select ...', but that worked as expected.
> Are you running on a 32-bit platform?  If so, I might need to repeat my
> test on a virtual machine to see what you're seeing.
>
> One thing I did notice: while it was creating the sorted table, SQLite
> was writing a pretty large temporary file into /var/tmp, even though my
> test database was on a different filesystem.  Is there any chance that
> one of your system disks is quietly filling up while the browse build is
> happening on your other filesystem?
>
> We definitely had browse indexes bigger than 2GB when I was working at
> the NLA, so don't despair just yet :)
>
> Cheers,
>
> Mark
>
>
>
> Tod Olson <[hidden email]> writes:
>
>> Has anyone run into size limits building the browse indexes?
>>
>> We're having a problem with building our browse indexes, the first one blows up:
>>
>> + java -Dfile.encoding=UTF-8 -cp 'browse-indexing-multi-title.jar:../solr/lib/*' CreateBrowseSQLite sorted-title.tmp title_browse.db
>> Exception in thread "main" java.sql.SQLException: disk I/O error
>> at org.sqlite.DB.execute(DB.java:275)
>> at org.sqlite.DB.executeUpdate(DB.java:281)
>> at org.sqlite.Stmt.executeUpdate(Stmt.java:103)
>> at CreateBrowseSQLite.buildOrderedTables(CreateBrowseSQLite.java:115)
>> at CreateBrowseSQLite.create(CreateBrowseSQLite.java:139)
>> at CreateBrowseSQLite.main(CreateBrowseSQLite.java:154)
>>
>> The BrowseIndexer code that blows up is:
>>
>>        stat.executeUpdate ("create table headings " +
>>                            "as select * from all_headings order by key;");
>>
>> All of our 8M+ rows got into all_headings, but we blow up creating
>> headings. We're nowhere near filling the disc, so I figure this is
>> maybe some kind of Java or SQLite limitation. The created DB file is
>> about 1.8GB. This sort of feels like a 32bit limit . If this were an
>> out of memory error, I would expect something other than an I/O
>> error. Has anyone run into this?
>
> --
> Mark Triggs
> <[hidden email]>


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Tod Olson
Definitely a 64-bit environment, I tried -d64 to be certain. Also tried
OpenJDK 1.7, and running the create from the command-line client, always
a disk I/O error. Actually running out of space seems unlikely. It's one
big ZFS partition, with a lot of free space. /tmp and /var/tmp all on
the same partition. So I'm a bit flummoxed on that side.

I was also looking at just bulk importing the sorted file with the
.import command. import seems to preserve the order of the input file.
But Looking at a small set of test data (65 records) it seems like the
sorted_title.tmp (created by sort(1)) is in a slightly different order
than the headings table created by the "create … select from … order by"
SQL statement. Which i find puzzling since sort is working from the keys.
Here's a fragment:

Jys/BE1JJ0FLJytNN ACM transactions on graphics.
JysrQ09BTTdBMwQxQ Accounting for fundamentalisms : the dynamic character of movements / edited by Martin E. Marty and R. Scott Appleby ; sponsored by the American Academy of Arts and Sciences.
Jz0vKwEIAA==ALEC
K0M/P0NBUy8nPU01B Commonwealth Edison Company, petitioner, v. Pollution Control Board of the State of Illinois and the Environmental Law Society of the University of Chicago Law School, respondents [electronic resource] : excerpts from record.
K0M/RUNLN0EzBD9PS Composing musick in parts.
K0MtLwRDMQQxLy0vS Code of federal regulations. 48, Federal acquisition regulations system [electronic resource].
K0NJQS89NzcENydBS Cornelii Iansenii Episcopi Gandavensis in Proverbia Salomonis accuratissima et doctissima commentaria, in quibus vulgata lectio [sic] tractatur, ut et diligens fiat collatio cum originalibus Hebrµis, & literalis simul cum mystico sensus tradatur. Accesserunt eiusdem annotationibus luculentissimµ in librum Sapientiµ Salomonis.
K0NJQS89NzcEOSdBS Cornelii Jansenii Episcopi gandavensis Paraphrasis in omnes Psalmos Davidicos, cum argumentis et annotationibus : itemq in Proverbia, & Ecclesiasticum commentaria veterisq Testamenti ecclesiae cantica, ac in sapientiam notae ... / Cornelii Iansenii Episcopi Gandavensis ; cum indice rerum & verborum locupletissimo ...
K0lPS0NBNydBJwQEQ Crusoniana; or, Truth versus fiction, elucidated in a history of the islands of Juan Fernandez. By the retired governor of that colony ...
KT03KzsEN0EELTcvB Blick in die pädagogischen Zeitschriften
KTMEBFEEBB4kBARBQ BG (-v. 69, no. 4)
KTcpPTdDM0knRTU3L Bibliographie Pädagogik

Of course the point of running the data through sort -u is to unique the data, but it's a bit confusing that the early titles come out A C B., and that sorting on the keys when creating the headings table gives the correct order.

Enough of this for today, I'll attack it again later. But any ideas would be welcome.

-Tod

On Nov 1, 2012, at 8:24 AM, Tod Olson <[hidden email]>
 wrote:

> Good question about the 32-bit environment. We're running with the
> OpenJDK 64-Bit Server VM, though maybe I should give it the -d64
> switch, just to be safe.
>
> But this also has me thinking about the import process. Since we have
> a sorted file, it seems like we should be able to bulk import into
> SQLite and not have to create headings from all_headings. The .import
> command might work, if it preserves the order of the data.
>
> -Tod
>
>
> On Nov 1, 2012, at 6:20 AM, Mark Triggs <[hidden email]>
> wrote:
>
>> Hi Tod,
>>
>> I tried a similar experiment here--created a 2GB+ table and then cloned
>> it using 'create table ... as select ...', but that worked as expected.
>> Are you running on a 32-bit platform?  If so, I might need to repeat my
>> test on a virtual machine to see what you're seeing.
>>
>> One thing I did notice: while it was creating the sorted table, SQLite
>> was writing a pretty large temporary file into /var/tmp, even though my
>> test database was on a different filesystem.  Is there any chance that
>> one of your system disks is quietly filling up while the browse build is
>> happening on your other filesystem?
>>
>> We definitely had browse indexes bigger than 2GB when I was working at
>> the NLA, so don't despair just yet :)
>>
>> Cheers,
>>
>> Mark
>>
>>
>>
>> Tod Olson <[hidden email]> writes:
>>
>>> Has anyone run into size limits building the browse indexes?
>>>
>>> We're having a problem with building our browse indexes, the first one blows up:
>>>
>>> + java -Dfile.encoding=UTF-8 -cp 'browse-indexing-multi-title.jar:../solr/lib/*' CreateBrowseSQLite sorted-title.tmp title_browse.db
>>> Exception in thread "main" java.sql.SQLException: disk I/O error
>>> at org.sqlite.DB.execute(DB.java:275)
>>> at org.sqlite.DB.executeUpdate(DB.java:281)
>>> at org.sqlite.Stmt.executeUpdate(Stmt.java:103)
>>> at CreateBrowseSQLite.buildOrderedTables(CreateBrowseSQLite.java:115)
>>> at CreateBrowseSQLite.create(CreateBrowseSQLite.java:139)
>>> at CreateBrowseSQLite.main(CreateBrowseSQLite.java:154)
>>>
>>> The BrowseIndexer code that blows up is:
>>>
>>>       stat.executeUpdate ("create table headings " +
>>>                           "as select * from all_headings order by key;");
>>>
>>> All of our 8M+ rows got into all_headings, but we blow up creating
>>> headings. We're nowhere near filling the disc, so I figure this is
>>> maybe some kind of Java or SQLite limitation. The created DB file is
>>> about 1.8GB. This sort of feels like a 32bit limit . If this were an
>>> out of memory error, I would expect something other than an I/O
>>> error. Has anyone run into this?
>>
>> --
>> Mark Triggs
>> <[hidden email]>
>


------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Mark Triggs-3
Hi Tod,

That's all a bit strange.  It feels a bit like this might an OS-level
issue rather than something happening within SQLite itself.  When you
tested from the command-line client, do you mean using the sqlite3
C-based client?  If so, that seems to suggest it's not an issue specific
to SQLiteJDBC (although you never know...)

As a basic sanity check, can you create a 2GB+ file on that filesystem
through any other means?  Does something like:

  mkfile 3072m junk.dat

also fail at the 2GB mark on that filesystem?

Also, it might be interesting to see your failing DB creation command
running under truss.  I don't have access to a Solaris machine at the
moment, but something like:

  truss -f -vall -rall -wall ./index-alphabetic-browse.sh 2>truss.log

(for example) might dump some useful information to truss.log that would
show which syscall is throwing the IO error.  I'm happy to lend another
pair of eyes if you want to send me that file, too.

And yeah, the sort ordering issue is a strange one.  As you say, the
primary purpose of that initial sort is to deduplicate.  I felt it best
to do the final sort within SQLite because it's the sole source of truth
as far as the ordering goes, and I was worried that the various Unix
sorts and MS Windows sort might give differing results based on locale.
I guess that proves that :)

Let me know if I can lend a hand,

Mark


Tod Olson <[hidden email]> writes:

> Definitely a 64-bit environment, I tried -d64 to be certain. Also tried
> OpenJDK 1.7, and running the create from the command-line client, always
> a disk I/O error. Actually running out of space seems unlikely. It's one
> big ZFS partition, with a lot of free space. /tmp and /var/tmp all on
> the same partition. So I'm a bit flummoxed on that side.
>
> I was also looking at just bulk importing the sorted file with the
> .import command. import seems to preserve the order of the input file.
> But Looking at a small set of test data (65 records) it seems like the
> sorted_title.tmp (created by sort(1)) is in a slightly different order
> than the headings table created by the "create … select from … order by"
> SQL statement. Which i find puzzling since sort is working from the keys.
> Here's a fragment:
>
> Jys/BE1JJ0FLJytNN ACM transactions on graphics.
> JysrQ09BTTdBMwQxQ Accounting for fundamentalisms : the dynamic character of movements / edited by Martin E. Marty and R. Scott Appleby ; sponsored by the American Academy of Arts and Sciences.
> Jz0vKwEIAA==ALEC
> K0M/P0NBUy8nPU01B Commonwealth Edison Company, petitioner, v. Pollution Control Board of the State of Illinois and the Environmental Law Society of the University of Chicago Law School, respondents [electronic resource] : excerpts from record.
> K0M/RUNLN0EzBD9PS Composing musick in parts.
> K0MtLwRDMQQxLy0vS Code of federal regulations. 48, Federal acquisition regulations system [electronic resource].
> K0NJQS89NzcENydBS Cornelii Iansenii Episcopi Gandavensis in Proverbia Salomonis accuratissima et doctissima commentaria, in quibus vulgata lectio [sic] tractatur, ut et diligens fiat collatio cum originalibus Hebrµis, & literalis simul cum mystico sensus tradatur. Accesserunt eiusdem annotationibus luculentissimµ in librum Sapientiµ Salomonis.
> K0NJQS89NzcEOSdBS Cornelii Jansenii Episcopi gandavensis Paraphrasis in omnes Psalmos Davidicos, cum argumentis et annotationibus : itemq in Proverbia, & Ecclesiasticum commentaria veterisq Testamenti ecclesiae cantica, ac in sapientiam notae ... / Cornelii Iansenii Episcopi Gandavensis ; cum indice rerum & verborum locupletissimo ...
> K0lPS0NBNydBJwQEQ Crusoniana; or, Truth versus fiction, elucidated in a history of the islands of Juan Fernandez. By the retired governor of that colony ...
> KT03KzsEN0EELTcvB Blick in die pädagogischen Zeitschriften
> KTMEBFEEBB4kBARBQ BG (-v. 69, no. 4)
> KTcpPTdDM0knRTU3L Bibliographie Pädagogik
>
> Of course the point of running the data through sort -u is to unique the data, but it's a bit confusing that the early titles come out A C B., and that sorting on the keys when creating the headings table gives the correct order.
>
> Enough of this for today, I'll attack it again later. But any ideas would be welcome.
>
> -Tod

--
Mark Triggs
<[hidden email]>

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Demian Katz
In reply to this post by Tod Olson
Regarding the sort issue, is it possible that ICU sort keys are designed to be sorted alpha-then-numeric rather than the more common numeric-then-alpha?  I can't explain why that would be the case, but it might explain what you are seeing.

- Demian
________________________________________
From: Tod Olson [[hidden email]]
Sent: Thursday, November 01, 2012 7:07 PM
To: Mark Triggs
Cc: [hidden email]   Mailinglist Tech
Subject: Re: [VuFind-Tech] Large browse indexes: SQLite limits?

Definitely a 64-bit environment, I tried -d64 to be certain. Also tried
OpenJDK 1.7, and running the create from the command-line client, always
a disk I/O error. Actually running out of space seems unlikely. It's one
big ZFS partition, with a lot of free space. /tmp and /var/tmp all on
the same partition. So I'm a bit flummoxed on that side.

I was also looking at just bulk importing the sorted file with the
.import command. import seems to preserve the order of the input file.
But Looking at a small set of test data (65 records) it seems like the
sorted_title.tmp (created by sort(1)) is in a slightly different order
than the headings table created by the "create … select from … order by"
SQL statement. Which i find puzzling since sort is working from the keys.
Here's a fragment:

Jys/BE1JJ0FLJytNN       ACM transactions on graphics.
JysrQ09BTTdBMwQxQ       Accounting for fundamentalisms : the dynamic character of movements / edited by Martin E. Marty and R. Scott Appleby ; sponsored by the American Academy of Arts and Sciences.
Jz0vKwEIAA==ALEC
K0M/P0NBUy8nPU01B       Commonwealth Edison Company, petitioner, v. Pollution Control Board of the State of Illinois and the Environmental Law Society of the University of Chicago Law School, respondents [electronic resource] : excerpts from record.
K0M/RUNLN0EzBD9PS       Composing musick in parts.
K0MtLwRDMQQxLy0vS       Code of federal regulations. 48, Federal acquisition regulations system [electronic resource].
K0NJQS89NzcENydBS       Cornelii Iansenii Episcopi Gandavensis in Proverbia Salomonis accuratissima et doctissima commentaria, in quibus vulgata lectio [sic] tractatur, ut et diligens fiat collatio cum originalibus Hebrµis, & literalis simul cum mystico sensus tradatur. Accesserunt eiusdem annotationibus luculentissimµ in librum Sapientiµ Salomonis.
K0NJQS89NzcEOSdBS       Cornelii Jansenii Episcopi gandavensis Paraphrasis in omnes Psalmos Davidicos, cum argumentis et annotationibus : itemq in Proverbia, & Ecclesiasticum commentaria veterisq Testamenti ecclesiae cantica, ac in sapientiam notae ... / Cornelii Iansenii Episcopi Gandavensis ; cum indice rerum & verborum locupletissimo ...
K0lPS0NBNydBJwQEQ       Crusoniana; or, Truth versus fiction, elucidated in a history of the islands of Juan Fernandez. By the retired governor of that colony ...
KT03KzsEN0EELTcvB       Blick in die pädagogischen Zeitschriften
KTMEBFEEBB4kBARBQ       BG (-v. 69, no. 4)
KTcpPTdDM0knRTU3L       Bibliographie Pädagogik

Of course the point of running the data through sort -u is to unique the data, but it's a bit confusing that the early titles come out A C B., and that sorting on the keys when creating the headings table gives the correct order.

Enough of this for today, I'll attack it again later. But any ideas would be welcome.

-Tod

On Nov 1, 2012, at 8:24 AM, Tod Olson <[hidden email]>
 wrote:

> Good question about the 32-bit environment. We're running with the
> OpenJDK 64-Bit Server VM, though maybe I should give it the -d64
> switch, just to be safe.
>
> But this also has me thinking about the import process. Since we have
> a sorted file, it seems like we should be able to bulk import into
> SQLite and not have to create headings from all_headings. The .import
> command might work, if it preserves the order of the data.
>
> -Tod
>
>
> On Nov 1, 2012, at 6:20 AM, Mark Triggs <[hidden email]>
> wrote:
>
>> Hi Tod,
>>
>> I tried a similar experiment here--created a 2GB+ table and then cloned
>> it using 'create table ... as select ...', but that worked as expected.
>> Are you running on a 32-bit platform?  If so, I might need to repeat my
>> test on a virtual machine to see what you're seeing.
>>
>> One thing I did notice: while it was creating the sorted table, SQLite
>> was writing a pretty large temporary file into /var/tmp, even though my
>> test database was on a different filesystem.  Is there any chance that
>> one of your system disks is quietly filling up while the browse build is
>> happening on your other filesystem?
>>
>> We definitely had browse indexes bigger than 2GB when I was working at
>> the NLA, so don't despair just yet :)
>>
>> Cheers,
>>
>> Mark
>>
>>
>>
>> Tod Olson <[hidden email]> writes:
>>
>>> Has anyone run into size limits building the browse indexes?
>>>
>>> We're having a problem with building our browse indexes, the first one blows up:
>>>
>>> + java -Dfile.encoding=UTF-8 -cp 'browse-indexing-multi-title.jar:../solr/lib/*' CreateBrowseSQLite sorted-title.tmp title_browse.db
>>> Exception in thread "main" java.sql.SQLException: disk I/O error
>>>     at org.sqlite.DB.execute(DB.java:275)
>>>     at org.sqlite.DB.executeUpdate(DB.java:281)
>>>     at org.sqlite.Stmt.executeUpdate(Stmt.java:103)
>>>     at CreateBrowseSQLite.buildOrderedTables(CreateBrowseSQLite.java:115)
>>>     at CreateBrowseSQLite.create(CreateBrowseSQLite.java:139)
>>>     at CreateBrowseSQLite.main(CreateBrowseSQLite.java:154)
>>>
>>> The BrowseIndexer code that blows up is:
>>>
>>>       stat.executeUpdate ("create table headings " +
>>>                           "as select * from all_headings order by key;");
>>>
>>> All of our 8M+ rows got into all_headings, but we blow up creating
>>> headings. We're nowhere near filling the disc, so I figure this is
>>> maybe some kind of Java or SQLite limitation. The created DB file is
>>> about 1.8GB. This sort of feels like a 32bit limit . If this were an
>>> out of memory error, I would expect something other than an I/O
>>> error. Has anyone run into this?
>>
>> --
>> Mark Triggs
>> <[hidden email]>
>


------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Tod Olson
In reply to this post by Mark Triggs-3
Here's the summary of of today's exploits:

1. We seem to be all 64-bit executables. No 32-bit file limits in the OS. And I could create a 4GB db file with sqlite3 just using .import.

2. the "create headings … order by step" fails for both the java code and sqlite3 command-line client. They both claim a "disc i/o error". I kind of think they are mis-reporting, actually, but can't prove it.

3. sqlite3 client succeeds in the create if I remove the "order by", so something is going on with the sorting.

4. ulimit -a says I should not be running out of any resource, and the details SQLite limitations document doesn't indicate a likely culprit.

Sifting through more of the details:

I did truss the run of CreateBrowseSQLite like so (this is FreeBSD truss):

        truss -faeD -o CreateBrowseSQLite-truss.log /usr/local/openjdk6/bin/java -Dfile.encoding=UTF-8 -cp 'browse-indexing.jar:../solr/lib/*' CreateBrowseSQLite sorted-title.tmp title_browse.db > title_browse.log 2>&1 &

Looking at the log, I can find where the program prints the exception message and stack trace, but it is unclear which of the previous systems calls could have triggered the exception. The zipped truss log is here if you want to take a look:

        http://www2.lib.uchicago.edu/~tod/vf/CreateBrowseSQLite-truss.log.gz

I also tried trussing the command-line client, sqlite3, running the following commands from a file:

        drop table if exists headings;
        create table headings as select * from all_headings order by key;
        create index keyindex on headings (key);

It also fails on that second SQL statement, reporting a disk I/O error. That truss log for sqlite3 is here:

        http://www2.lib.uchicago.edu/~tod/vf/sqlite-truss.log.gz

[***Here's where it starts to look like something.***]

In this case, the truss of sqlite3 shows a read with an invalid argument:

        read(5,0x800f64108,-1834983915) ERR#22 'Invalid argument'

File descriptor 5 is a temp file, in this case /var/tmp/etilqs_Rcm1nmODbgDXmJa. At this point in the truss output, the process has been doing a lot of lseek() and read() on title_browse.db, then a lot of lseek() and write() on the temp file. But then it suddenly tries to read from the temp file and gets an error.

So that third argument to read is type size_t, which is 64bits in this environment (truss just doesn't know types). But the FreeBSD read(2) man page says that EINVAL (errno 22) will result if that argument is > INT_MAX (= 2147483647). Which seems to be the case here.

So *maybe* there's something not well-checked in the SQLite "order by" code, or *maybe* there's something about the FreeBSD read() system call accepting a size_t (64-bit) offset, but only if it would fit as a non-negative in a 32-bit signed int. Neither option make me happy.

If I remove the "order by key" from the create, we have success. So I think figuring out how to sort outside of SQLite is likely the way to go.

-Tod


On Nov 1, 2012, at 6:40 PM, Mark Triggs <[hidden email]>
 wrote:

> Hi Tod,
>
> That's all a bit strange.  It feels a bit like this might an OS-level
> issue rather than something happening within SQLite itself.  When you
> tested from the command-line client, do you mean using the sqlite3
> C-based client?  If so, that seems to suggest it's not an issue specific
> to SQLiteJDBC (although you never know...)
>
> As a basic sanity check, can you create a 2GB+ file on that filesystem
> through any other means?  Does something like:
>
>  mkfile 3072m junk.dat
>
> also fail at the 2GB mark on that filesystem?
>
> Also, it might be interesting to see your failing DB creation command
> running under truss.  I don't have access to a Solaris machine at the
> moment, but something like:
>
>  truss -f -vall -rall -wall ./index-alphabetic-browse.sh 2>truss.log
>
> (for example) might dump some useful information to truss.log that would
> show which syscall is throwing the IO error.  I'm happy to lend another
> pair of eyes if you want to send me that file, too.
>
> And yeah, the sort ordering issue is a strange one.  As you say, the
> primary purpose of that initial sort is to deduplicate.  I felt it best
> to do the final sort within SQLite because it's the sole source of truth
> as far as the ordering goes, and I was worried that the various Unix
> sorts and MS Windows sort might give differing results based on locale.
> I guess that proves that :)
>
> Let me know if I can lend a hand,
>
> Mark
>
>
> Tod Olson <[hidden email]> writes:
>
>> Definitely a 64-bit environment, I tried -d64 to be certain. Also tried
>> OpenJDK 1.7, and running the create from the command-line client, always
>> a disk I/O error. Actually running out of space seems unlikely. It's one
>> big ZFS partition, with a lot of free space. /tmp and /var/tmp all on
>> the same partition. So I'm a bit flummoxed on that side.
>>
>> I was also looking at just bulk importing the sorted file with the
>> .import command. import seems to preserve the order of the input file.
>> But Looking at a small set of test data (65 records) it seems like the
>> sorted_title.tmp (created by sort(1)) is in a slightly different order
>> than the headings table created by the "create … select from … order by"
>> SQL statement. Which i find puzzling since sort is working from the keys.
>> Here's a fragment:
>>
>> Jys/BE1JJ0FLJytNN ACM transactions on graphics.
>> JysrQ09BTTdBMwQxQ Accounting for fundamentalisms : the dynamic character of movements / edited by Martin E. Marty and R. Scott Appleby ; sponsored by the American Academy of Arts and Sciences.
>> Jz0vKwEIAA==ALEC
>> K0M/P0NBUy8nPU01B Commonwealth Edison Company, petitioner, v. Pollution Control Board of the State of Illinois and the Environmental Law Society of the University of Chicago Law School, respondents [electronic resource] : excerpts from record.
>> K0M/RUNLN0EzBD9PS Composing musick in parts.
>> K0MtLwRDMQQxLy0vS Code of federal regulations. 48, Federal acquisition regulations system [electronic resource].
>> K0NJQS89NzcENydBS Cornelii Iansenii Episcopi Gandavensis in Proverbia Salomonis accuratissima et doctissima commentaria, in quibus vulgata lectio [sic] tractatur, ut et diligens fiat collatio cum originalibus Hebrµis, & literalis simul cum mystico sensus tradatur. Accesserunt eiusdem annotationibus luculentissimµ in librum Sapientiµ Salomonis.
>> K0NJQS89NzcEOSdBS Cornelii Jansenii Episcopi gandavensis Paraphrasis in omnes Psalmos Davidicos, cum argumentis et annotationibus : itemq in Proverbia, & Ecclesiasticum commentaria veterisq Testamenti ecclesiae cantica, ac in sapientiam notae ... / Cornelii Iansenii Episcopi Gandavensis ; cum indice rerum & verborum locupletissimo ...
>> K0lPS0NBNydBJwQEQ Crusoniana; or, Truth versus fiction, elucidated in a history of the islands of Juan Fernandez. By the retired governor of that colony ...
>> KT03KzsEN0EELTcvB Blick in die pädagogischen Zeitschriften
>> KTMEBFEEBB4kBARBQ BG (-v. 69, no. 4)
>> KTcpPTdDM0knRTU3L Bibliographie Pädagogik
>>
>> Of course the point of running the data through sort -u is to unique the data, but it's a bit confusing that the early titles come out A C B., and that sorting on the keys when creating the headings table gives the correct order.
>>
>> Enough of this for today, I'll attack it again later. But any ideas would be welcome.
>>
>> -Tod
>
> --
> Mark Triggs
> <[hidden email]>


------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Tod Olson
Resolution: I was running into a 32-bit integer overflow error that was fixed on Oct. 26 and will be in SQLite 3.7.15:

        http://www.sqlite.org/src/info/e24ba5bee4

(and my thanks to the sqlite-users list).

-Tod

On Nov 2, 2012, at 4:50 PM, Tod Olson <[hidden email]>
 wrote:

> Here's the summary of of today's exploits:
>
> 1. We seem to be all 64-bit executables. No 32-bit file limits in the OS. And I could create a 4GB db file with sqlite3 just using .import.
>
> 2. the "create headings … order by step" fails for both the java code and sqlite3 command-line client. They both claim a "disc i/o error". I kind of think they are mis-reporting, actually, but can't prove it.
>
> 3. sqlite3 client succeeds in the create if I remove the "order by", so something is going on with the sorting.
>
> 4. ulimit -a says I should not be running out of any resource, and the details SQLite limitations document doesn't indicate a likely culprit.
>
> Sifting through more of the details:
>
> I did truss the run of CreateBrowseSQLite like so (this is FreeBSD truss):
>
> truss -faeD -o CreateBrowseSQLite-truss.log /usr/local/openjdk6/bin/java -Dfile.encoding=UTF-8 -cp 'browse-indexing.jar:../solr/lib/*' CreateBrowseSQLite sorted-title.tmp title_browse.db > title_browse.log 2>&1 &
>
> Looking at the log, I can find where the program prints the exception message and stack trace, but it is unclear which of the previous systems calls could have triggered the exception. The zipped truss log is here if you want to take a look:
>
> http://www2.lib.uchicago.edu/~tod/vf/CreateBrowseSQLite-truss.log.gz
>
> I also tried trussing the command-line client, sqlite3, running the following commands from a file:
>
> drop table if exists headings;
> create table headings as select * from all_headings order by key;
> create index keyindex on headings (key);
>
> It also fails on that second SQL statement, reporting a disk I/O error. That truss log for sqlite3 is here:
>
> http://www2.lib.uchicago.edu/~tod/vf/sqlite-truss.log.gz
>
> [***Here's where it starts to look like something.***]
>
> In this case, the truss of sqlite3 shows a read with an invalid argument:
>
> read(5,0x800f64108,-1834983915) ERR#22 'Invalid argument'
>
> File descriptor 5 is a temp file, in this case /var/tmp/etilqs_Rcm1nmODbgDXmJa. At this point in the truss output, the process has been doing a lot of lseek() and read() on title_browse.db, then a lot of lseek() and write() on the temp file. But then it suddenly tries to read from the temp file and gets an error.
>
> So that third argument to read is type size_t, which is 64bits in this environment (truss just doesn't know types). But the FreeBSD read(2) man page says that EINVAL (errno 22) will result if that argument is > INT_MAX (= 2147483647). Which seems to be the case here.
>
> So *maybe* there's something not well-checked in the SQLite "order by" code, or *maybe* there's something about the FreeBSD read() system call accepting a size_t (64-bit) offset, but only if it would fit as a non-negative in a 32-bit signed int. Neither option make me happy.
>
> If I remove the "order by key" from the create, we have success. So I think figuring out how to sort outside of SQLite is likely the way to go.
>
> -Tod
>
>
> On Nov 1, 2012, at 6:40 PM, Mark Triggs <[hidden email]>
> wrote:
>
>> Hi Tod,
>>
>> That's all a bit strange.  It feels a bit like this might an OS-level
>> issue rather than something happening within SQLite itself.  When you
>> tested from the command-line client, do you mean using the sqlite3
>> C-based client?  If so, that seems to suggest it's not an issue specific
>> to SQLiteJDBC (although you never know...)
>>
>> As a basic sanity check, can you create a 2GB+ file on that filesystem
>> through any other means?  Does something like:
>>
>> mkfile 3072m junk.dat
>>
>> also fail at the 2GB mark on that filesystem?
>>
>> Also, it might be interesting to see your failing DB creation command
>> running under truss.  I don't have access to a Solaris machine at the
>> moment, but something like:
>>
>> truss -f -vall -rall -wall ./index-alphabetic-browse.sh 2>truss.log
>>
>> (for example) might dump some useful information to truss.log that would
>> show which syscall is throwing the IO error.  I'm happy to lend another
>> pair of eyes if you want to send me that file, too.
>>
>> And yeah, the sort ordering issue is a strange one.  As you say, the
>> primary purpose of that initial sort is to deduplicate.  I felt it best
>> to do the final sort within SQLite because it's the sole source of truth
>> as far as the ordering goes, and I was worried that the various Unix
>> sorts and MS Windows sort might give differing results based on locale.
>> I guess that proves that :)
>>
>> Let me know if I can lend a hand,
>>
>> Mark
>>
>>
>> Tod Olson <[hidden email]> writes:
>>
>>> Definitely a 64-bit environment, I tried -d64 to be certain. Also tried
>>> OpenJDK 1.7, and running the create from the command-line client, always
>>> a disk I/O error. Actually running out of space seems unlikely. It's one
>>> big ZFS partition, with a lot of free space. /tmp and /var/tmp all on
>>> the same partition. So I'm a bit flummoxed on that side.
>>>
>>> I was also looking at just bulk importing the sorted file with the
>>> .import command. import seems to preserve the order of the input file.
>>> But Looking at a small set of test data (65 records) it seems like the
>>> sorted_title.tmp (created by sort(1)) is in a slightly different order
>>> than the headings table created by the "create … select from … order by"
>>> SQL statement. Which i find puzzling since sort is working from the keys.
>>> Here's a fragment:
>>>
>>> Jys/BE1JJ0FLJytNN ACM transactions on graphics.
>>> JysrQ09BTTdBMwQxQ Accounting for fundamentalisms : the dynamic character of movements / edited by Martin E. Marty and R. Scott Appleby ; sponsored by the American Academy of Arts and Sciences.
>>> Jz0vKwEIAA==ALEC
>>> K0M/P0NBUy8nPU01B Commonwealth Edison Company, petitioner, v. Pollution Control Board of the State of Illinois and the Environmental Law Society of the University of Chicago Law School, respondents [electronic resource] : excerpts from record.
>>> K0M/RUNLN0EzBD9PS Composing musick in parts.
>>> K0MtLwRDMQQxLy0vS Code of federal regulations. 48, Federal acquisition regulations system [electronic resource].
>>> K0NJQS89NzcENydBS Cornelii Iansenii Episcopi Gandavensis in Proverbia Salomonis accuratissima et doctissima commentaria, in quibus vulgata lectio [sic] tractatur, ut et diligens fiat collatio cum originalibus Hebrµis, & literalis simul cum mystico sensus tradatur. Accesserunt eiusdem annotationibus luculentissimµ in librum Sapientiµ Salomonis.
>>> K0NJQS89NzcEOSdBS Cornelii Jansenii Episcopi gandavensis Paraphrasis in omnes Psalmos Davidicos, cum argumentis et annotationibus : itemq in Proverbia, & Ecclesiasticum commentaria veterisq Testamenti ecclesiae cantica, ac in sapientiam notae ... / Cornelii Iansenii Episcopi Gandavensis ; cum indice rerum & verborum locupletissimo ...
>>> K0lPS0NBNydBJwQEQ Crusoniana; or, Truth versus fiction, elucidated in a history of the islands of Juan Fernandez. By the retired governor of that colony ...
>>> KT03KzsEN0EELTcvB Blick in die pädagogischen Zeitschriften
>>> KTMEBFEEBB4kBARBQ BG (-v. 69, no. 4)
>>> KTcpPTdDM0knRTU3L Bibliographie Pädagogik
>>>
>>> Of course the point of running the data through sort -u is to unique the data, but it's a bit confusing that the early titles come out A C B., and that sorting on the keys when creating the headings table gives the correct order.
>>>
>>> Enough of this for today, I'll attack it again later. But any ideas would be welcome.
>>>
>>> -Tod
>>
>> --
>> Mark Triggs
>> <[hidden email]>
>


------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Mark Triggs-3
Ah, excellent.  With any luck SQLiteJDBC will pull that in too (or we
can always attempt our own build)

Mark


Tod Olson <[hidden email]> writes:

> Resolution: I was running into a 32-bit integer overflow error that was fixed on Oct. 26 and will be in SQLite 3.7.15:
>
> http://www.sqlite.org/src/info/e24ba5bee4
>
> (and my thanks to the sqlite-users list).
>
> -Tod

--
Mark Triggs
<[hidden email]>

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Tod Olson
Sadly, no. The SQLiteJDBC bundles the native drivers directly into the jar file. Here's some more of what I find:

The new sqlite3 (built from the source at the link below) does let me build the headings table from the command line.[1] And I can query it. There are three selects that the browse-handler code executes, and I can execute them by hand via sqlite3. But when the browse handler is queried for title browse data, it returns an HTTP 500 error with the message "database disk image is malformed"; however, it is quite happy with the smaller browse indexes.

I did look at compiling up a new SQLiteJDBC. The driver that ships with VuFind seems to be from 2008 (based on opening up the jar file) so could probably use an update anyhow. It looks the current SQLiteJDBC is at the Xerial project[2]. When building from source[3], the build process downloads the SQLite 3.7.8 code. (3.7.15 will be released on 12/12.) The build is structured such that it is not immediately obvious to me how to substitute in the patched code without reworking some of the Makefile.

I'm not really certain what my next action should be. Probably work on the SQLiteJDBC recompile. If anyone is familiar with the build process, I could use some tips on how to get the desired SQLite source into the build.

-Tod

[1] headings and all_headings are each 9,173,673 rows, the title_browse.db is 5.2GB.
[2] http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC
[3] http://code.google.com/p/sqlite-jdbc/source/checkout

On Nov 3, 2012, at 4:58 PM, Mark Triggs <[hidden email]> wrote:

> Ah, excellent.  With any luck SQLiteJDBC will pull that in too (or we
> can always attempt our own build)
>
> Mark
>
>
> Tod Olson <[hidden email]> writes:
>
>> Resolution: I was running into a 32-bit integer overflow error that was fixed on Oct. 26 and will be in SQLite 3.7.15:
>>
>> http://www.sqlite.org/src/info/e24ba5bee4
>>
>> (and my thanks to the sqlite-users list).
>>
>> -Tod
>
> --
> Mark Triggs
> <[hidden email]>


------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Demian Katz
Not sure if this makes any difference, but there is a notice on the Google Code page that the project has now moved here:

https://bitbucket.org/xerial/sqlite-jdbc

Looks like there's been a bit of new development (as of September).

- Demian

> -----Original Message-----
> From: Tod Olson [mailto:[hidden email]]
> Sent: Wednesday, November 07, 2012 11:17 AM
> To: Mark Triggs
> Cc: [hidden email] Mailinglist Tech
> Subject: Re: [VuFind-Tech] Large browse indexes: SQLite limits?
>
> Sadly, no. The SQLiteJDBC bundles the native drivers directly into the jar
> file. Here's some more of what I find:
>
> The new sqlite3 (built from the source at the link below) does let me build
> the headings table from the command line.[1] And I can query it. There are
> three selects that the browse-handler code executes, and I can execute them by
> hand via sqlite3. But when the browse handler is queried for title browse
> data, it returns an HTTP 500 error with the message "database disk image is
> malformed"; however, it is quite happy with the smaller browse indexes.
>
> I did look at compiling up a new SQLiteJDBC. The driver that ships with VuFind
> seems to be from 2008 (based on opening up the jar file) so could probably use
> an update anyhow. It looks the current SQLiteJDBC is at the Xerial project[2].
> When building from source[3], the build process downloads the SQLite 3.7.8
> code. (3.7.15 will be released on 12/12.) The build is structured such that it
> is not immediately obvious to me how to substitute in the patched code without
> reworking some of the Makefile.
>
> I'm not really certain what my next action should be. Probably work on the
> SQLiteJDBC recompile. If anyone is familiar with the build process, I could
> use some tips on how to get the desired SQLite source into the build.
>
> -Tod
>
> [1] headings and all_headings are each 9,173,673 rows, the title_browse.db is
> 5.2GB.
> [2] http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC
> [3] http://code.google.com/p/sqlite-jdbc/source/checkout
>
> On Nov 3, 2012, at 4:58 PM, Mark Triggs <[hidden email]> wrote:
>
> > Ah, excellent.  With any luck SQLiteJDBC will pull that in too (or we
> > can always attempt our own build)
> >
> > Mark
> >
> >
> > Tod Olson <[hidden email]> writes:
> >
> >> Resolution: I was running into a 32-bit integer overflow error that was
> fixed on Oct. 26 and will be in SQLite 3.7.15:
> >>
> >> http://www.sqlite.org/src/info/e24ba5bee4
> >>
> >> (and my thanks to the sqlite-users list).
> >>
> >> -Tod
> >
> > --
> > Mark Triggs
> > <[hidden email]>
>
>
> ------------------------------------------------------------------------------
> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> Stay in control, update software, and manage PCs from one command center
> Diagnose problems and improve visibility into emerging IT issues
> Automate, monitor and manage. Do more in less time with Central
> http://p.sf.net/sfu/logmein12331_d2d
> _______________________________________________
> Vufind-tech mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/vufind-tech

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Tod Olson
So, our sysadmin Peggy figured out how to make the FreeBSD ports system
to do the heavy lifting on compiling SQLiteJDBC with the updated SQLite
source. Upshot is: with an SQLiteJDBC build on the updated SQLite, we can
now read 5.2G dbs! Huzzah!

I'm also rebuilding the indexes from scratch again, just to be certain.
So far so good.

Good to see the more recent activity. I'd taken the source URL from the
project page and overlooked the Google Code notice.

I think that after 12/12 rolls around and SQLite 3.7.15 is released, it
might be worth updating the SQLite JDBC that VuFind ships with.

Best,

-Tod

On Nov 7, 2012, at 10:42 AM, Demian Katz <[hidden email]>
 wrote:

> Not sure if this makes any difference, but there is a notice on the Google Code page that the project has now moved here:
>
> https://bitbucket.org/xerial/sqlite-jdbc
>
> Looks like there's been a bit of new development (as of September).
>
> - Demian
>
>> -----Original Message-----
>> From: Tod Olson [mailto:[hidden email]]
>> Sent: Wednesday, November 07, 2012 11:17 AM
>> To: Mark Triggs
>> Cc: [hidden email] Mailinglist Tech
>> Subject: Re: [VuFind-Tech] Large browse indexes: SQLite limits?
>>
>> Sadly, no. The SQLiteJDBC bundles the native drivers directly into the jar
>> file. Here's some more of what I find:
>>
>> The new sqlite3 (built from the source at the link below) does let me build
>> the headings table from the command line.[1] And I can query it. There are
>> three selects that the browse-handler code executes, and I can execute them by
>> hand via sqlite3. But when the browse handler is queried for title browse
>> data, it returns an HTTP 500 error with the message "database disk image is
>> malformed"; however, it is quite happy with the smaller browse indexes.
>>
>> I did look at compiling up a new SQLiteJDBC. The driver that ships with VuFind
>> seems to be from 2008 (based on opening up the jar file) so could probably use
>> an update anyhow. It looks the current SQLiteJDBC is at the Xerial project[2].
>> When building from source[3], the build process downloads the SQLite 3.7.8
>> code. (3.7.15 will be released on 12/12.) The build is structured such that it
>> is not immediately obvious to me how to substitute in the patched code without
>> reworking some of the Makefile.
>>
>> I'm not really certain what my next action should be. Probably work on the
>> SQLiteJDBC recompile. If anyone is familiar with the build process, I could
>> use some tips on how to get the desired SQLite source into the build.
>>
>> -Tod
>>
>> [1] headings and all_headings are each 9,173,673 rows, the title_browse.db is
>> 5.2GB.
>> [2] http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC
>> [3] http://code.google.com/p/sqlite-jdbc/source/checkout
>>
>> On Nov 3, 2012, at 4:58 PM, Mark Triggs <[hidden email]> wrote:
>>
>>> Ah, excellent.  With any luck SQLiteJDBC will pull that in too (or we
>>> can always attempt our own build)
>>>
>>> Mark
>>>
>>>
>>> Tod Olson <[hidden email]> writes:
>>>
>>>> Resolution: I was running into a 32-bit integer overflow error that was
>> fixed on Oct. 26 and will be in SQLite 3.7.15:
>>>>
>>>> http://www.sqlite.org/src/info/e24ba5bee4
>>>>
>>>> (and my thanks to the sqlite-users list).
>>>>
>>>> -Tod
>>>
>>> --
>>> Mark Triggs
>>> <[hidden email]>
>>
>>
>> ------------------------------------------------------------------------------
>> LogMeIn Central: Instant, anywhere, Remote PC access and management.
>> Stay in control, update software, and manage PCs from one command center
>> Diagnose problems and improve visibility into emerging IT issues
>> Automate, monitor and manage. Do more in less time with Central
>> http://p.sf.net/sfu/logmein12331_d2d
>> _______________________________________________
>> Vufind-tech mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/vufind-tech


------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Mark Triggs-3
Many thanks for all the investigative work, Tod--that all sounds great
to me.  I've made a note in my diary to check back on the 12th of Dec
and will build and upgrade the browse version of SQLiteJDBC at that
point.

Cheers,

Mark


Tod Olson <[hidden email]> writes:

> So, our sysadmin Peggy figured out how to make the FreeBSD ports system
> to do the heavy lifting on compiling SQLiteJDBC with the updated SQLite
> source. Upshot is: with an SQLiteJDBC build on the updated SQLite, we can
> now read 5.2G dbs! Huzzah!
>
> I'm also rebuilding the indexes from scratch again, just to be certain.
> So far so good.
>
> Good to see the more recent activity. I'd taken the source URL from the
> project page and overlooked the Google Code notice.
>
> I think that after 12/12 rolls around and SQLite 3.7.15 is released, it
> might be worth updating the SQLite JDBC that VuFind ships with.
>
> Best,
>
> -Tod

--
Mark Triggs
<[hidden email]>

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Demian Katz
Thanks, Mark.  Once you've built it to your satisfaction, send me a reminder and I'll do a little extra testing over here and then push it into trunk/master.

- Demian

> -----Original Message-----
> From: Mark Triggs [mailto:[hidden email]]
> Sent: Wednesday, November 07, 2012 2:32 PM
> To: Tod Olson
> Cc: Demian Katz; [hidden email] Mailinglist Tech
> Subject: Re: [VuFind-Tech] Large browse indexes: SQLite limits?
>
> Many thanks for all the investigative work, Tod--that all sounds great
> to me.  I've made a note in my diary to check back on the 12th of Dec
> and will build and upgrade the browse version of SQLiteJDBC at that
> point.
>
> Cheers,
>
> Mark
>
>
> Tod Olson <[hidden email]> writes:
>
> > So, our sysadmin Peggy figured out how to make the FreeBSD ports system
> > to do the heavy lifting on compiling SQLiteJDBC with the updated SQLite
> > source. Upshot is: with an SQLiteJDBC build on the updated SQLite, we can
> > now read 5.2G dbs! Huzzah!
> >
> > I'm also rebuilding the indexes from scratch again, just to be certain.
> > So far so good.
> >
> > Good to see the more recent activity. I'd taken the source URL from the
> > project page and overlooked the Google Code notice.
> >
> > I think that after 12/12 rolls around and SQLite 3.7.15 is released, it
> > might be worth updating the SQLite JDBC that VuFind ships with.
> >
> > Best,
> >
> > -Tod
>
> --
> Mark Triggs
> <[hidden email]>

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large browse indexes: SQLite limits?

Mark Triggs-3
In reply to this post by Mark Triggs-3
Just a note that I haven't forgotten about this, but I need to find some
time to have a think about it.  I tried the official SQLiteJDBC build,
but its bundled Linux build of SQLite requires a glibc version that's
newer than what ships with my (and, I suspect, other people's) distro.

I'm not sure whether I'll need to do a custom build, or whether we can
just tell people to use the latest SQLiteJDBC as a drop-in replacement
if they hit problems on Solaris.  More investigation needed :)

Mark


Mark Triggs <[hidden email]> writes:

> Many thanks for all the investigative work, Tod--that all sounds great
> to me.  I've made a note in my diary to check back on the 12th of Dec
> and will build and upgrade the browse version of SQLiteJDBC at that
> point.
>
> Cheers,
>
> Mark
>
>
> Tod Olson <[hidden email]> writes:
>
>> So, our sysadmin Peggy figured out how to make the FreeBSD ports system
>> to do the heavy lifting on compiling SQLiteJDBC with the updated SQLite
>> source. Upshot is: with an SQLiteJDBC build on the updated SQLite, we can
>> now read 5.2G dbs! Huzzah!
>>
>> I'm also rebuilding the indexes from scratch again, just to be certain.
>> So far so good.
>>
>> Good to see the more recent activity. I'd taken the source URL from the
>> project page and overlooked the Google Code notice.
>>
>> I think that after 12/12 rolls around and SQLite 3.7.15 is released, it
>> might be worth updating the SQLite JDBC that VuFind ships with.
>>
>> Best,
>>
>> -Tod

--
Mark Triggs
<[hidden email]>

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Loading...