java marcimporter refactor patch

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

java marcimporter refactor patch

Reuben Pasquini-3
Hello!

I put together a patch the refactors the java marcimporter
code that I hope I can get accepted into the main SVN
(I just pulled the code off the trunk).
I'm writing some other code to pull records out of our Voyager
database and push them into SOLR (VuFind),
so the changes I made were geared at making it easy
for me to use org.voyager.marc.MarcImporter to push
Marc into Voyager.  

Here's a rundown of the changes:

*. Refactor MarcImporter to implement a VufindImporter interface:

package org.vufind.marc;

import org.marc4j.marc.Record;
import org.apache.solr.update.AddUpdateCommand;

/**
 * Interface for importer that takes a MARC record
 * and generates a Solr add/update command
 */
public interface VufindImporter {

    /** Provide read-only access to this property */
    public String getControlField();
    /** Provide read-only access to this property */
    public char getControlSubfield();

    /**
     * Returns the String for the control field
     * Just make this public for regression testing.
     *
     * @param record Record to retrieve the control field data from
     * @return Value of the control field defined by the user
     */
    public String getControlField(Record record);


    /**
     * Build an add/update command for the given MARC record
     *
     * @param record to build a Solr command for
     * @return a command with the record metadata ready to upload
     */
    public  AddUpdateCommand buildAddUpdateCommand(Record record);
}

*. Move main() out of MarcImporter to a CommandLine class.

*. Add a junit test case to the org.vufind.test package,
     and a 'runTest' target to build.xml that runs the test.

*. Misc. changes - just pushing code around to manage logging,
       get rid of globals, add checks to work with VuFind 1.8 Solr
schema,
       etc.

An svn diff follows, and I've attached a zip file with the code.
I've only tested the code with a couple records, so
let me know if you spot any bugs.  I didn't
change any logic - just moved things around.
Anyway - I hope you'll accept the patch.
Let me know what you think.

Cheers,
Reuben


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech

svn.diff (113K) Download Attachment
build.xml (13K) Download Attachment
build.properties (252 bytes) Download Attachment
VufindImporter.java (1K) Download Attachment
CommandLine.java (11K) Download Attachment
MarcImporter.java (39K) Download Attachment
VufindImporterTest.java (12K) Download Attachment
AllTests.java (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: java marcimporter refactor patch

wsgrah
Administrator
Hi Reuben,

This looks really great. Have you had a look at Solrmarc yet? We're in
the process of migrating all the importing code over to that project and
we've started working on some of these interfaces to make it a bit
easier. I've also been giving some thought to some different readers for
marc4j. A compressed marc reader (my 6GB of marc compresses to 2GB in a
gzipped tarball), and I think a slightly more customized JDBC reader
that could pull marc from databases (if you ILS allows such things).

One of the really nice things about the solrmarc project is that it
attempts to fix "bad" records (mostly records that have bad UTF records
that aren't UTF), so you can clean up your marc records in the process
of indexing. You also have the ability to reindex your records directly
form the index (in case you want to tweak the schema.xml files).

Wayne

Reuben Pasquini wrote:

> Hello!
>
> I put together a patch the refactors the java marcimporter
> code that I hope I can get accepted into the main SVN
> (I just pulled the code off the trunk).
> I'm writing some other code to pull records out of our Voyager
> database and push them into SOLR (VuFind),
> so the changes I made were geared at making it easy
> for me to use org.voyager.marc.MarcImporter to push
> Marc into Voyager.  
>
> Here's a rundown of the changes:
>
> *. Refactor MarcImporter to implement a VufindImporter interface:
>
> package org.vufind.marc;
>
> import org.marc4j.marc.Record;
> import org.apache.solr.update.AddUpdateCommand;
>
> /**
>  * Interface for importer that takes a MARC record
>  * and generates a Solr add/update command
>  */
> public interface VufindImporter {
>
>     /** Provide read-only access to this property */
>     public String getControlField();
>     /** Provide read-only access to this property */
>     public char getControlSubfield();
>
>     /**
>      * Returns the String for the control field
>      * Just make this public for regression testing.
>      *
>      * @param record Record to retrieve the control field data from
>      * @return Value of the control field defined by the user
>      */
>     public String getControlField(Record record);
>
>
>     /**
>      * Build an add/update command for the given MARC record
>      *
>      * @param record to build a Solr command for
>      * @return a command with the record metadata ready to upload
>      */
>     public  AddUpdateCommand buildAddUpdateCommand(Record record);
> }
>
> *. Move main() out of MarcImporter to a CommandLine class.
>
> *. Add a junit test case to the org.vufind.test package,
>      and a 'runTest' target to build.xml that runs the test.
>
> *. Misc. changes - just pushing code around to manage logging,
>        get rid of globals, add checks to work with VuFind 1.8 Solr
> schema,
>        etc.
>
> An svn diff follows, and I've attached a zip file with the code.
> I've only tested the code with a couple records, so
> let me know if you spot any bugs.  I didn't
> change any logic - just moved things around.
> Anyway - I hope you'll accept the patch.
> Let me know what you think.
>
> Cheers,
> Reuben
>
>  

--
/**
  * Wayne Graham
  * Earl Gregg Swem Library
  * PO Box 8794
  * Williamsburg, VA 23188
  * 757.221.3112
  * http://swem.wm.edu/blogs/waynegraham/
  */



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: java marcimporter refactor patch

Reuben Pasquini-3
Hi Wayne,

Thanks for the info.
I knew there was something going on with solrmarc,
but I didn't see any code under:
      svn co https://vufind.svn.sourceforge.net/svnroot/vufind/trunk 
for solrmarc, so I let myself get sidetracked.
I'll take a look at solrmarc - foudn the Google code page.   I just
need something that exports an API similar to VufindImport (below)
with a method like:
         public void addOrUpdateMarc( Record record ) ...

BTW - the SVN browser appears to be busted:
       http://vufind.svn.sourceforge.net/viewvc/vufind/

Cheers,
Reuben


>>> Wayne Graham <[hidden email]> 7/23/2008 7:56 AM >>>
Hi Reuben,

This looks really great. Have you had a look at Solrmarc yet? We're in

the process of migrating all the importing code over to that project
and
we've started working on some of these interfaces to make it a bit
easier. I've also been giving some thought to some different readers
for
marc4j. A compressed marc reader (my 6GB of marc compresses to 2GB in a

gzipped tarball), and I think a slightly more customized JDBC reader
that could pull marc from databases (if you ILS allows such things).

One of the really nice things about the solrmarc project is that it
attempts to fix "bad" records (mostly records that have bad UTF records

that aren't UTF), so you can clean up your marc records in the process

of indexing. You also have the ability to reindex your records directly

form the index (in case you want to tweak the schema.xml files).

Wayne

Reuben Pasquini wrote:

> Hello!
>
> I put together a patch the refactors the java marcimporter
> code that I hope I can get accepted into the main SVN
> (I just pulled the code off the trunk).
> I'm writing some other code to pull records out of our Voyager
> database and push them into SOLR (VuFind),
> so the changes I made were geared at making it easy
> for me to use org.voyager.marc.MarcImporter to push
> Marc into Voyager.  
>
> Here's a rundown of the changes:
>
> *. Refactor MarcImporter to implement a VufindImporter interface:
>
> package org.vufind.marc;
>
> import org.marc4j.marc.Record;
> import org.apache.solr.update.AddUpdateCommand;
>
> /**
>  * Interface for importer that takes a MARC record
>  * and generates a Solr add/update command
>  */
> public interface VufindImporter {
>
>     /** Provide read-only access to this property */
>     public String getControlField();
>     /** Provide read-only access to this property */
>     public char getControlSubfield();
>
>     /**
>      * Returns the String for the control field
>      * Just make this public for regression testing.
>      *
>      * @param record Record to retrieve the control field data from
>      * @return Value of the control field defined by the user
>      */
>     public String getControlField(Record record);
>
>
>     /**
>      * Build an add/update command for the given MARC record
>      *
>      * @param record to build a Solr command for
>      * @return a command with the record metadata ready to upload
>      */
>     public  AddUpdateCommand buildAddUpdateCommand(Record record);
> }
>
> *. Move main() out of MarcImporter to a CommandLine class.
>
> *. Add a junit test case to the org.vufind.test package,
>      and a 'runTest' target to build.xml that runs the test.
>
> *. Misc. changes - just pushing code around to manage logging,
>        get rid of globals, add checks to work with VuFind 1.8 Solr
> schema,
>        etc.
>
> An svn diff follows, and I've attached a zip file with the code.
> I've only tested the code with a couple records, so
> let me know if you spot any bugs.  I didn't
> change any logic - just moved things around.
> Anyway - I hope you'll accept the patch.
> Let me know what you think.
>
> Cheers,
> Reuben
>
>  

--
/**
  * Wayne Graham
  * Earl Gregg Swem Library
  * PO Box 8794
  * Williamsburg, VA 23188
  * 757.221.3112
  * http://swem.wm.edu/blogs/waynegraham/ 
  */



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: java marcimporter refactor patch

wsgrah
Administrator
I'll be sure that method gets into solrmarc.

I think sourceforge is still recovering from their svn
migration...really a pain.

Wayne

Reuben Pasquini wrote:

> Hi Wayne,
>
> Thanks for the info.
> I knew there was something going on with solrmarc,
> but I didn't see any code under:
>       svn co https://vufind.svn.sourceforge.net/svnroot/vufind/trunk 
> for solrmarc, so I let myself get sidetracked.
> I'll take a look at solrmarc - foudn the Google code page.   I just
> need something that exports an API similar to VufindImport (below)
> with a method like:
>          public void addOrUpdateMarc( Record record ) ...
>
> BTW - the SVN browser appears to be busted:
>        http://vufind.svn.sourceforge.net/viewvc/vufind/
>
> Cheers,
> Reuben
>
>
>  
>>>> Wayne Graham <[hidden email]> 7/23/2008 7:56 AM >>>
>>>>        
> Hi Reuben,
>
> This looks really great. Have you had a look at Solrmarc yet? We're in
>
> the process of migrating all the importing code over to that project
> and
> we've started working on some of these interfaces to make it a bit
> easier. I've also been giving some thought to some different readers
> for
> marc4j. A compressed marc reader (my 6GB of marc compresses to 2GB in a
>
> gzipped tarball), and I think a slightly more customized JDBC reader
> that could pull marc from databases (if you ILS allows such things).
>
> One of the really nice things about the solrmarc project is that it
> attempts to fix "bad" records (mostly records that have bad UTF records
>
> that aren't UTF), so you can clean up your marc records in the process
>
> of indexing. You also have the ability to reindex your records directly
>
> form the index (in case you want to tweak the schema.xml files).
>
> Wayne
>
> Reuben Pasquini wrote:
>  
>> Hello!
>>
>> I put together a patch the refactors the java marcimporter
>> code that I hope I can get accepted into the main SVN
>> (I just pulled the code off the trunk).
>> I'm writing some other code to pull records out of our Voyager
>> database and push them into SOLR (VuFind),
>> so the changes I made were geared at making it easy
>> for me to use org.voyager.marc.MarcImporter to push
>> Marc into Voyager.  
>>
>> Here's a rundown of the changes:
>>
>> *. Refactor MarcImporter to implement a VufindImporter interface:
>>
>> package org.vufind.marc;
>>
>> import org.marc4j.marc.Record;
>> import org.apache.solr.update.AddUpdateCommand;
>>
>> /**
>>  * Interface for importer that takes a MARC record
>>  * and generates a Solr add/update command
>>  */
>> public interface VufindImporter {
>>
>>     /** Provide read-only access to this property */
>>     public String getControlField();
>>     /** Provide read-only access to this property */
>>     public char getControlSubfield();
>>
>>     /**
>>      * Returns the String for the control field
>>      * Just make this public for regression testing.
>>      *
>>      * @param record Record to retrieve the control field data from
>>      * @return Value of the control field defined by the user
>>      */
>>     public String getControlField(Record record);
>>
>>
>>     /**
>>      * Build an add/update command for the given MARC record
>>      *
>>      * @param record to build a Solr command for
>>      * @return a command with the record metadata ready to upload
>>      */
>>     public  AddUpdateCommand buildAddUpdateCommand(Record record);
>> }
>>
>> *. Move main() out of MarcImporter to a CommandLine class.
>>
>> *. Add a junit test case to the org.vufind.test package,
>>      and a 'runTest' target to build.xml that runs the test.
>>
>> *. Misc. changes - just pushing code around to manage logging,
>>        get rid of globals, add checks to work with VuFind 1.8 Solr
>> schema,
>>        etc.
>>
>> An svn diff follows, and I've attached a zip file with the code.
>> I've only tested the code with a couple records, so
>> let me know if you spot any bugs.  I didn't
>> change any logic - just moved things around.
>> Anyway - I hope you'll accept the patch.
>> Let me know what you think.
>>
>> Cheers,
>> Reuben
>>
>>  
>>    
>
>  

--
/**
  * Wayne Graham
  * Earl Gregg Swem Library
  * PO Box 8794
  * Williamsburg, VA 23188
  * 757.221.3112
  * http://swem.wm.edu/blogs/waynegraham/
  */



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech