OAI Server

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

OAI Server

Luke O'Sullivan
Hi Folks,

I'm in the process of creating an OAI feed for the British Library to
harvest our Theses. I've created a new RecordDriver for Theses which
has a check in the getXML method for the "uketddc" metadata format.

I've also used the set_query option to apply the following rules:

set_query['thesis'] = "format:Thesis"

This does work but a few records slip through the cracks so that
format:Thesis returns results which are do not have the new
RecordDriver and therefore do not have the required xml.

When the OAI Server encounters a record which does not provide uketddc
xml, it throws an error, thus killing the OAI process:

//VuFind/OAI/Server::listRecords
foreach ($result->getResults() as $doc) {
    if (!$this->attachNonDeleted($xml, $doc, $format, $headersOnly)) {
        $this->unexpectedError('Cannot load document');
    }
    $currentCursor++;
}

Is there any reason why this error needs to be thrown? Could it fail
silently or does the OAI-PMH standard insist on the correct record
count?

Thanks,

Luke

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: OAI Server

Demian Katz
Luke,

I never put time into investigating whether we could deal with unsupported metadata formats in a more robust way, since the OAI server was coded on the assumption of relatively heterogeneous content. There's probably a better approach, but implementing something more flexible would require a close and careful study of the spec, and might force us to do things that would adversely impact performance (e.g. if we need to know the exact count of supported records up front to satisfy the spec, figuring that out could be expensive).

Before going down that road, I would suggest two possible alternative solutions that could work with the existing implementation:

1.) Can you create a Trait to implement the uketddc metadata format, and then override all of the record drivers currently in play in your system to incorporate that trait? Obviously, if you need special fields only available in a subset of your records, that's not going to work, but if it's possible to create a 'bare minimum' implementation that could work with existing universal fields like title and author, this approach could get you 100% coverage.

2.) If you do have to filter to a particular record driver, you can change the Solr biblio schema so that the recordtype field is indexed as well as stored... then reindex your data, and you can filter on recordtype:your-custom-type and be 100% sure that no bad records will ever creep into your feeds.

Let me know if you need more help with any of this!

- Demian

-----Original Message-----
From: Luke O'Sullivan [mailto:[hidden email]]
Sent: Wednesday, February 08, 2017 4:14 AM
To: vufind-tech
Subject: [VuFind-Tech] OAI Server

Hi Folks,

I'm in the process of creating an OAI feed for the British Library to harvest our Theses. I've created a new RecordDriver for Theses which has a check in the getXML method for the "uketddc" metadata format.

I've also used the set_query option to apply the following rules:

set_query['thesis'] = "format:Thesis"

This does work but a few records slip through the cracks so that format:Thesis returns results which are do not have the new RecordDriver and therefore do not have the required xml.

When the OAI Server encounters a record which does not provide uketddc xml, it throws an error, thus killing the OAI process:

//VuFind/OAI/Server::listRecords
foreach ($result->getResults() as $doc) {
    if (!$this->attachNonDeleted($xml, $doc, $format, $headersOnly)) {
        $this->unexpectedError('Cannot load document');
    }
    $currentCursor++;
}

Is there any reason why this error needs to be thrown? Could it fail silently or does the OAI-PMH standard insist on the correct record count?

Thanks,

Luke

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsdm.link%2Fslashdot&data=02%7C01%7Cdemian.katz%40villanova.edu%7C71a21d58772e4c8f1ac708d450038901%7C765a8de5cf9444f09cafae5bf8cfa366%7C1%7C0%7C636221423499505967&sdata=4aJ5AZiBYeIFrQ32RdgE5zCoLxFmGfBy%2B%2Bp8F57I8F8%3D&reserved=0
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fvufind-tech&data=02%7C01%7Cdemian.katz%40villanova.edu%7C71a21d58772e4c8f1ac708d450038901%7C765a8de5cf9444f09cafae5bf8cfa366%7C1%7C0%7C636221423499505967&sdata=6tsRCjKtxO%2F8LZcnqERzdjdIgnc4Lxg87VCIG3N0HRY%3D&reserved=0
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|

Re: OAI Server

hatop
In reply to this post by Luke O'Sullivan

  Hi Luke,

  I do have similar setting, and for records which do not provide
  the required format, the RecordDriver returns '' instead of false;

  This makes the OAI server silently skip the record (line 318
  // If RecordDriver returns nothing, skip this record), which is
  sufficient for the specs I believe.
 
  -- goetz.


On Wed, Feb 08, 2017 at 09:14:15AM +0000, Luke O'Sullivan wrote:

> Hi Folks,
>
> I'm in the process of creating an OAI feed for the British Library to
> harvest our Theses. I've created a new RecordDriver for Theses which
> has a check in the getXML method for the "uketddc" metadata format.
>
> I've also used the set_query option to apply the following rules:
>
> set_query['thesis'] = "format:Thesis"
>
> This does work but a few records slip through the cracks so that
> format:Thesis returns results which are do not have the new
> RecordDriver and therefore do not have the required xml.
>
> When the OAI Server encounters a record which does not provide uketddc
> xml, it throws an error, thus killing the OAI process:
>
> //VuFind/OAI/Server::listRecords
> foreach ($result->getResults() as $doc) {
>     if (!$this->attachNonDeleted($xml, $doc, $format, $headersOnly)) {
>         $this->unexpectedError('Cannot load document');
>     }
>     $currentCursor++;
> }
>
> Is there any reason why this error needs to be thrown? Could it fail
> silently or does the OAI-PMH standard insist on the correct record
> count?
>
> Thanks,
>
> Luke
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Vufind-tech mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/vufind-tech

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech