Quantcast

Revisiting an old import question

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Revisiting an old import question

Shepard, Thomas - 1150 - MITLL

Last year I posted a question regarding methods I might normalized catalog data as it is imported into Vufind.

Specifically, there are vocabulary fields used in facets.

At the time I hoped to be able to transform values that originated in a variety of text formats into some common format, preferably first letter cap, the rest lowercase.

I understood that this was probably not possible using regex because Java did not support uppercasing.

 

Well, I am wondering if there have been any changes since last year that might suggest a new workaround.

 

To use an example, currently one of our faceted vocabulary fields is set up as follows:

 

tax = custom, removeTrailingPunct(691a)

 

Can the removeTrailingPunc is modified (or copied and renamed) to convert the data in 691a into ALL CAPS?

If so, how might one go about it?

 

Sorry to be a pest about this, but it is kind of important to us right now.

 

Happy Holidays,

Thom

 

 

Thom Shepard

MIT Lincoln Lab
244 Wood St.

Lexington, MA 01523

[hidden email]

781 981 0370

 


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech

smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Revisiting an old import question

Demian Katz

Thom,

 

SolrMarc 3.x offers an expanded index specification language that should make this much easier than before. See this wiki page for some documentation:

 

https://github.com/solrmarc/solrmarc/wiki/Index-Specification-Language

 

For the example you mention, I believe the solution is:

 

tax = 691a, stripPunct, toUpper

 

In order to get SolrMarc 3.x, you’ll either need to upgrade to VuFind 3.1.x or else manually install the new SolrMarc version into your existing release. If you need to patch something together, it’s probably mostly a matter of copying the import-marc.sh script from the latest VuFind release along with the relevant .jar files and other contents of the import directory.

 

If you still have questions or problems, please let me know!

 

- Demian

 

From: Shepard, Thomas - 0050 - MITLL [mailto:[hidden email]]
Sent: Tuesday, December 20, 2016 2:15 PM
To: [hidden email]; [hidden email]
Subject: [VuFind-Tech] Revisiting an old import question

 

Last year I posted a question regarding methods I might normalized catalog data as it is imported into Vufind.

Specifically, there are vocabulary fields used in facets.

At the time I hoped to be able to transform values that originated in a variety of text formats into some common format, preferably first letter cap, the rest lowercase.

I understood that this was probably not possible using regex because Java did not support uppercasing.

 

Well, I am wondering if there have been any changes since last year that might suggest a new workaround.

 

To use an example, currently one of our faceted vocabulary fields is set up as follows:

 

tax = custom, removeTrailingPunct(691a)

 

Can the removeTrailingPunc is modified (or copied and renamed) to convert the data in 691a into ALL CAPS?

If so, how might one go about it?

 

Sorry to be a pest about this, but it is kind of important to us right now.

 

Happy Holidays,

Thom

 

 

Thom Shepard

MIT Lincoln Lab
244 Wood St.

Lexington, MA 01523

[hidden email]

781 981 0370

 


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Revisiting an old import question

Shepard, Thomas - 1150 - MITLL

Demian,

Actually our library programmer has come up with an alternative solution for lower- and upper-casing terms in schema.xml which seems to work.

 

That said, he has encountered a different problem:

In the  import process he needs to concatenate 2 values (url + description) into a single value, separated by a pipe if possible.

 

I set up the following in marc_local.properties:

url = 856u:856z

 

but of course this merely results in vufind treating the values as separate entities; for example:

        "url":["http://www.crcnetbase.com/isbn/978-1-4398-6282-7",

          "http://www.crcnetbase.com/isbn/978-1-4398-6287-2",

          "Vol.1",

          "Vol.2"],

 

So in cases like the above where a record has multiple urls and multiple |z descriptions, we are concerned that the order of two fields won’t always match up, especially if one url lacks a |z tag.

Anyway we are hoping there is a function that will concatenate 856u and 856z so that our programmer can do his parsing magic and be guaranteed that the right description will go with the right url.

Any thoughts?

 

Thanks,

Thom

 

 

From: Demian Katz [mailto:[hidden email]]
Sent: Tuesday, December 20, 2016 3:23 PM
To: Shepard, Thomas - 0050 - MITLL; [hidden email]; [hidden email]
Cc: [hidden email]
Subject: RE: Revisiting an old import question

 

Thom,

 

SolrMarc 3.x offers an expanded index specification language that should make this much easier than before. See this wiki page for some documentation:

 

https://github.com/solrmarc/solrmarc/wiki/Index-Specification-Language

 

For the example you mention, I believe the solution is:

 

tax = 691a, stripPunct, toUpper

 

In order to get SolrMarc 3.x, you’ll either need to upgrade to VuFind 3.1.x or else manually install the new SolrMarc version into your existing release. If you need to patch something together, it’s probably mostly a matter of copying the import-marc.sh script from the latest VuFind release along with the relevant .jar files and other contents of the import directory.

 

If you still have questions or problems, please let me know!

 

- Demian

 

From: Shepard, Thomas - 0050 - MITLL [[hidden email]]
Sent: Tuesday, December 20, 2016 2:15 PM
To: [hidden email]; [hidden email]
Subject: [VuFind-Tech] Revisiting an old import question

 

Last year I posted a question regarding methods I might normalized catalog data as it is imported into Vufind.

Specifically, there are vocabulary fields used in facets.

At the time I hoped to be able to transform values that originated in a variety of text formats into some common format, preferably first letter cap, the rest lowercase.

I understood that this was probably not possible using regex because Java did not support uppercasing.

 

Well, I am wondering if there have been any changes since last year that might suggest a new workaround.

 

To use an example, currently one of our faceted vocabulary fields is set up as follows:

 

tax = custom, removeTrailingPunct(691a)

 

Can the removeTrailingPunc is modified (or copied and renamed) to convert the data in 691a into ALL CAPS?

If so, how might one go about it?

 

Sorry to be a pest about this, but it is kind of important to us right now.

 

Happy Holidays,

Thom

 

 

Thom Shepard

MIT Lincoln Lab
244 Wood St.

Lexington, MA 01523

[hidden email]

781 981 0370

 


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech

smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Revisiting an old import question

Demian Katz

Thom,

 

I believe what you want, using the SolrMarc 3.x syntax, would be:

 

url = 856uz, join("|")

 

There may be a way to do this in the 2.x syntax as well, though I’m less certain about that. If you need it, Bob might be able to help clarify the possibilities.

 

- Demian

 

From: Shepard, Thomas - 0050 - MITLL [mailto:[hidden email]]
Sent: Wednesday, December 21, 2016 10:14 AM
To: Demian Katz; [hidden email]; [hidden email]
Cc: [hidden email]
Subject: RE: Revisiting an old import question

 

Demian,

Actually our library programmer has come up with an alternative solution for lower- and upper-casing terms in schema.xml which seems to work.

 

That said, he has encountered a different problem:

In the  import process he needs to concatenate 2 values (url + description) into a single value, separated by a pipe if possible.

 

I set up the following in marc_local.properties:

url = 856u:856z

 

but of course this merely results in vufind treating the values as separate entities; for example:

        "url":["http://www.crcnetbase.com/isbn/978-1-4398-6282-7",

          "http://www.crcnetbase.com/isbn/978-1-4398-6287-2",

          "Vol.1",

          "Vol.2"],

 

So in cases like the above where a record has multiple urls and multiple |z descriptions, we are concerned that the order of two fields won’t always match up, especially if one url lacks a |z tag.

Anyway we are hoping there is a function that will concatenate 856u and 856z so that our programmer can do his parsing magic and be guaranteed that the right description will go with the right url.

Any thoughts?

 

Thanks,

Thom

 

 

From: Demian Katz [[hidden email]]
Sent: Tuesday, December 20, 2016 3:23 PM
To: Shepard, Thomas - 0050 - MITLL; [hidden email]; [hidden email]
Cc: [hidden email]
Subject: RE: Revisiting an old import question

 

Thom,

 

SolrMarc 3.x offers an expanded index specification language that should make this much easier than before. See this wiki page for some documentation:

 

https://github.com/solrmarc/solrmarc/wiki/Index-Specification-Language

 

For the example you mention, I believe the solution is:

 

tax = 691a, stripPunct, toUpper

 

In order to get SolrMarc 3.x, you’ll either need to upgrade to VuFind 3.1.x or else manually install the new SolrMarc version into your existing release. If you need to patch something together, it’s probably mostly a matter of copying the import-marc.sh script from the latest VuFind release along with the relevant .jar files and other contents of the import directory.

 

If you still have questions or problems, please let me know!

 

- Demian

 

From: Shepard, Thomas - 0050 - MITLL [[hidden email]]
Sent: Tuesday, December 20, 2016 2:15 PM
To: [hidden email]; [hidden email]
Subject: [VuFind-Tech] Revisiting an old import question

 

Last year I posted a question regarding methods I might normalized catalog data as it is imported into Vufind.

Specifically, there are vocabulary fields used in facets.

At the time I hoped to be able to transform values that originated in a variety of text formats into some common format, preferably first letter cap, the rest lowercase.

I understood that this was probably not possible using regex because Java did not support uppercasing.

 

Well, I am wondering if there have been any changes since last year that might suggest a new workaround.

 

To use an example, currently one of our faceted vocabulary fields is set up as follows:

 

tax = custom, removeTrailingPunct(691a)

 

Can the removeTrailingPunc is modified (or copied and renamed) to convert the data in 691a into ALL CAPS?

If so, how might one go about it?

 

Sorry to be a pest about this, but it is kind of important to us right now.

 

Happy Holidays,

Thom

 

 

Thom Shepard

MIT Lincoln Lab
244 Wood St.

Lexington, MA 01523

[hidden email]

781 981 0370

 


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Revisiting an old import question

Robert Haschart
Demian is right on the syntax for SolrMarc 3.x   As he surmises it was doable in SolrMarc 2.x as well, where you could do the following:  
url = 856uz'|'
Although I'm not sure that syntax was ever documented anywhere.

However in checking the output for Demian's suggestion on a small set of sample records it seems that it won't do exactly what you want.

On a record with this field:
856 40$uhttp://virginia.kanopystreaming.com/node/222222$zA Kanopy streaming video
the result is exactly what you would hope for:
url : http://virginia.kanopystreaming.com/node/222222|A Kanopy streaming video
however for records where the $z subfield occurs before the $u subfield, or for ones without a $z subfield, or for ones that have 2 $z subfields such as the following examples:
856 42$zInternet Movie Database Summary and Reviews:$uhttp://us.imdb.com/title/tt0040897/

856 41$uhttp://www.nap.edu/books/0309034442/html/

856 7 $zOnline version:$uhttp://www.stat-usa.gov/BEN/bea1/scb.html$2http$z(requires Adobe Acrobat software which is available for download)
The respective results are not so good:
url : Internet Movie Database Summary and Reviews:|http://us.imdb.com/title/tt0040897/

url : http://www.nap.edu/books/0309034442/html/

url : Online version:|http://www.stat-usa.gov/BEN/bea1/scb.html|(requires Adobe Acrobat software which is available for download)
I've just committed some new code to support a new feature to allow the specification of a format for a field:
url = 856uz, format("$u|$z")
where the results for the above four examples are as follows:
url : http://virginia.kanopystreaming.com/node/222222|A Kanopy streaming video

url : http://us.imdb.com/title/tt0040897/|Internet Movie Database Summary and Reviews:

url : <a href="http://www.nap.edu/books/0309034442/html/|" target="_blank">http://www.nap.edu/books/0309034442/html/|

url : http://www.stat-usa.gov/BEN/bea1/scb.html|Online version:
This new code will be included the planned release for the first week of January when I'm back in the office.

The release will likely also include support for a new Modifier Operator  "toTitleCase"  whic is analogous to "toLower" or "toUpper"  but which will change the input to be in "tltle case"  where the first letter of words are capitalized and the rest are lower case.


-Bob Haschart

From: [hidden email] [[hidden email]] on behalf of Demian Katz [[hidden email]]
Sent: Wednesday, December 21, 2016 10:26 AM
To: Shepard, Thomas - 0050 - MITLL; [hidden email]; [hidden email]
Cc: [hidden email]
Subject: [solrmarc-tech] RE: Revisiting an old import question

Thom,

 

I believe what you want, using the SolrMarc 3.x syntax, would be:

 

url = 856uz, join("|")

 

There may be a way to do this in the 2.x syntax as well, though I’m less certain about that. If you need it, Bob might be able to help clarify the possibilities.

 

- Demian

 

From: Shepard, Thomas - 0050 - MITLL [mailto:[hidden email]]
Sent: Wednesday, December 21, 2016 10:14 AM
To: Demian Katz; [hidden email]; [hidden email]
Cc: [hidden email]
Subject: RE: Revisiting an old import question

 

Demian,

Actually our library programmer has come up with an alternative solution for lower- and upper-casing terms in schema.xml which seems to work.

 

That said, he has encountered a different problem:

In the  import process he needs to concatenate 2 values (url + description) into a single value, separated by a pipe if possible.

 

I set up the following in marc_local.properties:

url = 856u:856z

 

but of course this merely results in vufind treating the values as separate entities; for example:

        "url":["http://www.crcnetbase.com/isbn/978-1-4398-6282-7",

          "http://www.crcnetbase.com/isbn/978-1-4398-6287-2",

          "Vol.1",

          "Vol.2"],

 

So in cases like the above where a record has multiple urls and multiple |z descriptions, we are concerned that the order of two fields won’t always match up, especially if one url lacks a |z tag.

Anyway we are hoping there is a function that will concatenate 856u and 856z so that our programmer can do his parsing magic and be guaranteed that the right description will go with the right url.

Any thoughts?

 

Thanks,

Thom

 

 

From: Demian Katz [mailto:demian.katz@...]
Sent: Tuesday, December 20, 2016 3:23 PM
To: Shepard, Thomas - 0050 - MITLL; vufind-tech@...; vufind-general@...
Cc: solrmarc-tech@...
Subject: RE: Revisiting an old import question

 

Thom,

 

SolrMarc 3.x offers an expanded index specification language that should make this much easier than before. See this wiki page for some documentation:

 

https://github.com/solrmarc/solrmarc/wiki/Index-Specification-Language

 

For the example you mention, I believe the solution is:

 

tax = 691a, stripPunct, toUpper

 

In order to get SolrMarc 3.x, you’ll either need to upgrade to VuFind 3.1.x or else manually install the new SolrMarc version into your existing release. If you need to patch something together, it’s probably mostly a matter of copying the import-marc.sh script from the latest VuFind release along with the relevant .jar files and other contents of the import directory.

 

If you still have questions or problems, please let me know!

 

- Demian

 

From: Shepard, Thomas - 0050 - MITLL [mailto:tshepard@...]
Sent: Tuesday, December 20, 2016 2:15 PM
To: vufind-tech@...; vufind-general@...
Subject: [VuFind-Tech] Revisiting an old import question

 

Last year I posted a question regarding methods I might normalized catalog data as it is imported into Vufind.

Specifically, there are vocabulary fields used in facets.

At the time I hoped to be able to transform values that originated in a variety of text formats into some common format, preferably first letter cap, the rest lowercase.

I understood that this was probably not possible using regex because Java did not support uppercasing.

 

Well, I am wondering if there have been any changes since last year that might suggest a new workaround.

 

To use an example, currently one of our faceted vocabulary fields is set up as follows:

 

tax = custom, removeTrailingPunct(691a)

 

Can the removeTrailingPunc is modified (or copied and renamed) to convert the data in 691a into ALL CAPS?

If so, how might one go about it?

 

Sorry to be a pest about this, but it is kind of important to us right now.

 

Happy Holidays,

Thom

 

 

Thom Shepard

MIT Lincoln Lab
244 Wood St.

Lexington, MA 01523

tshepard@...

781 981 0370

 

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tech+unsubscribe@....
To post to this group, send email to solrmarc-tech@....
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
_______________________________________________
Vufind-tech mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/vufind-tech
Loading...