Quantcast

Maddening import

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Maddening import

Schulkins, Joe

Hi,

 

I'm using Vufind 3.0.3 and I have a maddening XML import which I'm hoping someone can help me with.

 

I'm using the batch import tool to import xml files into Solr however in several hierarchy fields I'm not getting the values I expect.

 

In my xsl I have the following entries:

 

                <!-- COLLECTION TITLE -->

                <xsl:if test="atom[@name='topTitle']">

                    <field name="collection">

                        <xsl:value-of select="atom[@name='topTitle'][normalize-space()]"/>

                    </field>

                </xsl:if>

               

                <!-- HIERARCHY BROWSE -->

                <field name="hierarchy_browse">

                    <xsl:value-of select="atom[@name='topTitle'][normalize-space()]"/>{{{_ID_}}}<xsl:value-of select="atom[@name='topIrn'][normalize-space()]"/>

                </field>

               

                <!-- HIERARCHY TOP ID -->

                <field name="hierarchy_top_id">

                     <xsl:value-of select="atom[@name='topIrn'][normalize-space()]"/>

                </field>

  

 

                <!-- HIERARCHY TOP TITLE-->

                <field name="hierarchy_top_title">

                     <xsl:value-of select="atom[@name='topTitle'][normalize-space()]"/>

                  </field>

 

And the xml records look like this:

 

<doc name="record">

    <atom name="irn" type="text" size="short">61950</atom>

    <atom name="ObjectType" type="text" size="short">Archives</atom>

    <atom name="EADLevelAttribute" type="text" size="short">Item</atom>

    <atom name="EADUnitID" type="text" size="short">D42.A1.01</atom>

    <atom name="EADUnitTitle" type="text" size="short">Memorandum and Articles of Association</atom>

    <atom name="EADScopeAndContent" type="text" size="short">Also includes Special Resolution of the Cunard Steamship Company Ltd, passed 26 Jun 1969.</atom>

    <atom name="EADBiographyOrHistory" type="text" size="short"/>

    <atom name="EADArrangement" type="text" size="short"/>

    <atom name="EADUnitDate" type="text" size="short">23 May 1878</atom>

    <atom name="EADAccruals" type="text" size="short"/>

    <atom name="EADOtherFindingAid" type="text" size="short"/>

    <atom name="EADRelatedMaterial" type="text" size="short"/>

    <atom name="EADAppraisalInformation" type="text" size="short"/>

    <atom name="EADSeparatedMaterial" type="text" size="short"/>

    <atom name="EADTitleProper" type="text" size="short"/>

    <atom name="EADPublicationStatement" type="text" size="short"/>

    <atom name="EADCustodialHistory" type="text" size="short"/>

    <atom name="EADSource" type="text" size="short"/>

    <atom name="EADNote" type="text" size="short"/>

    <atom name="EADAccessRestrictions" type="text" size="short"/>

    <atom name="EADUseRestrictions" type="text" size="short"/>

    <atom name="topIrn" type="text" size="short">61948</atom>

    <atom name="topTitle" type="text" size="short">Shareholders Records</atom>

    <tuple name="AssParentObjectRef">

      <atom name="EADUnitTitle" type="text" size="short">Memorandum and Articles of Association</atom>

      <atom name="irn" type="text" size="short">61949</atom>

    </tuple>

    <tuple name="EADAcquisitionInformationRef"/>

    <table name="EADExtent_tab">

      <tuple>

        <atom name="EADExtent" type="text" size="short">1 item.</atom>

      </tuple>

    </table>

</doc>

 

 

Running xsltproc on an import file and my xsl I can see in the output that the correct values are being assigned:

 

<doc>

<field name="ead_access"/>

<field name="ead_accruals"/>

<field name="allfields"/>

<field name="ead_appraisal"/>

<field name="ead_arrangement"/>

<field name="ead_biography"/>

<field name="collection">Shareholders Records</field>

<field name="ead_custodial_history"/>

<field name="dateSpan">23 May 1878</field>

<field name="description">Also includes Special Resolution of the Cunard Steamship Company Ltd, passed 26 Jun 1969.</field>

<field name="ead_extent">1 item.</field>

<field name="format">Item</field>

<field name="hierarchy_browse">Shareholders Records{{{_ID_}}}61948</field>

<field name="is_hierarchy_id">61950</field>

<field name="hierarchy_parent_id">61949</field>

<field name="hierarchy_parent_title">Memorandum and Articles of Association</field>

<field name="hierarchy_sequence">D42.A1.01</field>

<field name="is_hierarchy_title">Memorandum and Articles of Association</field>

<field name="hierarchy_top_id">61948</field>

<field name="hierarchy_top_title">Shareholders Records</field>

<field name="callnumber-raw">D42.A1.01</field>

<field name="callnumber-raw">D42.A1.01</field>

<field name="id">61950</field>

<field name="institution">University of Liverpool Special Collections and Archives</field>

<field name="ead_note"/>

<field name="ead_other_finding_aid"/>

<field name="recordtype">Emu Records</field>

<field name="ead_separated"/>

<field name="ead_source"/>

<field name="title">Memorandum and Articles of Association</field>

<field name="title_short">Memorandum and Articles of Association</field>

<field name="title_full">Memorandum and Articles of Association</field>

<field name="ead_use_restrictions"/>

</doc>

 

 

But when I import it in to Solr the 'collection' ends up the same as the title, the 'hierarchy_top_id' becomes the same as the ID and the 'hierarchy_top_title' also becomes the same as the title.

 

I have checked the $VUFIND_HOME/solr/vufind/biblio/conf/schema.xml and I have even tried adding topIrn and topTitle to the Solr index and using copyFields to populate those problem fields. For this I amended my xsl file to include:

 

    <field name="topIrn">

          <xsl:value-of select="atom[@name='topIrn'][normalize-space()]"/>

    </field>

   

    <field name="topTitle">

          <xsl:value-of select="atom[@name='topTitle'][normalize-space()]"/>

    </field>

    

and I removed the entries for 'collection', 'hierarchy_top_title' and 'hierarchy_top_id'. This time looking at staff view shows my new fields of 'topIrn' and 'topTitle' but the values they (as well as those they are copying to) display are incorrect with 'topIrn' showing the 'id' and 'topTitle' showing the 'title'.

 

Can anyone help shed some light on where I'm going wrong or offer another solution to try?

 

Thanks for any help,

Joe

 

Joseph Schulkins

Systems Librarian

University of Liverpool

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
VuFind-General mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Maddening import

Demian Katz

Have you tried running the import/import-xsl.php script with the --test-only flag to check whether the transform is misbehaving in PHP or whether this is an index problem occurring after the transform? The PHP XSLT processor is rather old, so it is possible that the xsltproc tool you are using is not behaving consistently with it. Using the --test-only flag lets you troubleshoot exactly the same logic that will ultimately index the records.

 

Perhaps that will reveal an obvious problem. If it doesn’t, please feel free to report back with your results and I’ll give it some more thought!

- Demian

 

From: Schulkins, Joe [mailto:[hidden email]]
Sent: Wednesday, May 03, 2017 11:05 AM
To: [hidden email]
Subject: [VuFind-General] Maddening import

 

Hi,

 

I'm using Vufind 3.0.3 and I have a maddening XML import which I'm hoping someone can help me with.

 

I'm using the batch import tool to import xml files into Solr however in several hierarchy fields I'm not getting the values I expect.

 

In my xsl I have the following entries:

 

                <!-- COLLECTION TITLE -->

                <xsl:if test="atom[@name='topTitle']">

                    <field name="collection">

                        <xsl:value-of select="atom[@name='topTitle'][normalize-space()]"/>

                    </field>

                </xsl:if>

               

                <!-- HIERARCHY BROWSE -->

                <field name="hierarchy_browse">

                    <xsl:value-of select="atom[@name='topTitle'][normalize-space()]"/>{{{_ID_}}}<xsl:value-of select="atom[@name='topIrn'][normalize-space()]"/>

                </field>

               

                <!-- HIERARCHY TOP ID -->

                <field name="hierarchy_top_id">

                     <xsl:value-of select="atom[@name='topIrn'][normalize-space()]"/>

                </field>

  

 

                <!-- HIERARCHY TOP TITLE-->

                <field name="hierarchy_top_title">

                     <xsl:value-of select="atom[@name='topTitle'][normalize-space()]"/>

                  </field>

 

And the xml records look like this:

 

<doc name="record">

    <atom name="irn" type="text" size="short">61950</atom>

    <atom name="ObjectType" type="text" size="short">Archives</atom>

    <atom name="EADLevelAttribute" type="text" size="short">Item</atom>

    <atom name="EADUnitID" type="text" size="short">D42.A1.01</atom>

    <atom name="EADUnitTitle" type="text" size="short">Memorandum and Articles of Association</atom>

    <atom name="EADScopeAndContent" type="text" size="short">Also includes Special Resolution of the Cunard Steamship Company Ltd, passed 26 Jun 1969.</atom>

    <atom name="EADBiographyOrHistory" type="text" size="short"/>

    <atom name="EADArrangement" type="text" size="short"/>

    <atom name="EADUnitDate" type="text" size="short">23 May 1878</atom>

    <atom name="EADAccruals" type="text" size="short"/>

    <atom name="EADOtherFindingAid" type="text" size="short"/>

    <atom name="EADRelatedMaterial" type="text" size="short"/>

    <atom name="EADAppraisalInformation" type="text" size="short"/>

    <atom name="EADSeparatedMaterial" type="text" size="short"/>

    <atom name="EADTitleProper" type="text" size="short"/>

    <atom name="EADPublicationStatement" type="text" size="short"/>

    <atom name="EADCustodialHistory" type="text" size="short"/>

    <atom name="EADSource" type="text" size="short"/>

    <atom name="EADNote" type="text" size="short"/>

    <atom name="EADAccessRestrictions" type="text" size="short"/>

    <atom name="EADUseRestrictions" type="text" size="short"/>

    <atom name="topIrn" type="text" size="short">61948</atom>

    <atom name="topTitle" type="text" size="short">Shareholders Records</atom>

    <tuple name="AssParentObjectRef">

      <atom name="EADUnitTitle" type="text" size="short">Memorandum and Articles of Association</atom>

      <atom name="irn" type="text" size="short">61949</atom>

    </tuple>

    <tuple name="EADAcquisitionInformationRef"/>

    <table name="EADExtent_tab">

      <tuple>

        <atom name="EADExtent" type="text" size="short">1 item.</atom>

      </tuple>

    </table>

</doc>

 

 

Running xsltproc on an import file and my xsl I can see in the output that the correct values are being assigned:

 

<doc>

<field name="ead_access"/>

<field name="ead_accruals"/>

<field name="allfields"/>

<field name="ead_appraisal"/>

<field name="ead_arrangement"/>

<field name="ead_biography"/>

<field name="collection">Shareholders Records</field>

<field name="ead_custodial_history"/>

<field name="dateSpan">23 May 1878</field>

<field name="description">Also includes Special Resolution of the Cunard Steamship Company Ltd, passed 26 Jun 1969.</field>

<field name="ead_extent">1 item.</field>

<field name="format">Item</field>

<field name="hierarchy_browse">Shareholders Records{{{_ID_}}}61948</field>

<field name="is_hierarchy_id">61950</field>

<field name="hierarchy_parent_id">61949</field>

<field name="hierarchy_parent_title">Memorandum and Articles of Association</field>

<field name="hierarchy_sequence">D42.A1.01</field>

<field name="is_hierarchy_title">Memorandum and Articles of Association</field>

<field name="hierarchy_top_id">61948</field>

<field name="hierarchy_top_title">Shareholders Records</field>

<field name="callnumber-raw">D42.A1.01</field>

<field name="callnumber-raw">D42.A1.01</field>

<field name="id">61950</field>

<field name="institution">University of Liverpool Special Collections and Archives</field>

<field name="ead_note"/>

<field name="ead_other_finding_aid"/>

<field name="recordtype">Emu Records</field>

<field name="ead_separated"/>

<field name="ead_source"/>

<field name="title">Memorandum and Articles of Association</field>

<field name="title_short">Memorandum and Articles of Association</field>

<field name="title_full">Memorandum and Articles of Association</field>

<field name="ead_use_restrictions"/>

</doc>

 

 

But when I import it in to Solr the 'collection' ends up the same as the title, the 'hierarchy_top_id' becomes the same as the ID and the 'hierarchy_top_title' also becomes the same as the title.

 

I have checked the $VUFIND_HOME/solr/vufind/biblio/conf/schema.xml and I have even tried adding topIrn and topTitle to the Solr index and using copyFields to populate those problem fields. For this I amended my xsl file to include:

 

    <field name="topIrn">

          <xsl:value-of select="atom[@name='topIrn'][normalize-space()]"/>

    </field>

   

    <field name="topTitle">

          <xsl:value-of select="atom[@name='topTitle'][normalize-space()]"/>

    </field>

    

and I removed the entries for 'collection', 'hierarchy_top_title' and 'hierarchy_top_id'. This time looking at staff view shows my new fields of 'topIrn' and 'topTitle' but the values they (as well as those they are copying to) display are incorrect with 'topIrn' showing the 'id' and 'topTitle' showing the 'title'.

 

Can anyone help shed some light on where I'm going wrong or offer another solution to try?

 

Thanks for any help,

Joe

 

Joseph Schulkins

Systems Librarian

University of Liverpool

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
VuFind-General mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-general
Loading...