Skip FOLIO Project Navigation

Mapping MARC fields to Codex

codex

#1

I had some very specific thoughts about the “Instance Metadata Elements” slides from @Kathryn’s powerpoint. I’m curious what others think.

I’m wondering why 130 is mapped to creator? I think it would be appropriate to map to title. Often, a 245 is not enough to distinguish a resource. When this is the case, there should be a 130 that would help.

As was already brought up in chat, publisher should come from both 260 and 264 $b

I’m concerned about taking the publication date(s) from 26x. For serial publications at least, often the 26x$c is not populated, but the fixed field date elements should be.

As a former map cataloger, I’m also thinking about some difficulties particular to accurate identification of maps, and I’m sure other non-book formats (any music catalogers here?) have additional challenges. Would it be possible to have a field somewhat akin to the 368 “other attributes” (http://www.loc.gov/marc/authority/ad368.html) currently used in authority records? This could be mapped, for instance, to scale and coordinates fields for maps.

I’m also wondering if the “creator” field could be repeatable. Currently, due to the structure of MARC, a book with two authors will have one traced in a 100 and the other traced in a 700. But both have creator roles. (This might present a challenge with pre-RDA records?)

Finally, in some cases I think the 7xx fields could be very helpful for identification/disambiguation. For example, many serial publications have an issuing corporate body in a 710, while the 130 might have a place or date that is much less helpful.


#2

Hi, Laura…

A couple of response to your questions and comments…

  1. Re: the 130, that’s likely my error. The uniform title should have been mapped to the Resource Name element, along with the 245 and its subfields.

  2. Mapping publisher data from both fields is an easy enough change to make.

  3. To confirm, the publication date mappings would be more accurate/complete if they came from the 008/07–14…correct?

  4. I do think that we need to have a conversation about non-book resources and some of the challenges they may present to this model – or other types of data that we may need to include to support access to that content through the Codex. This is an area in which I will need to defer to the group’s expertise…and probably one in which a workshop-type call might be most beneficial to fully exploring the issues.

  5. I see no reason why Creator can’t be repeatable…it’s a carryover from MARC, but probably an anachronistic one given where we’re headed. The challenges of mapping probably warrant a discussion, either here or in a future call.

  6. The 7xx fields have been in and out of the work I’ve done…and was a topic that I wanted to broach with the group. For the purposes of disambiguation, do you expect this information to appear on a search results display?

I’m going to update the spreadsheet I referenced in my slide deck with some of your comments about proper mappings so that this stuff stays in sync.

Thanks for the feedback and questions!


#3

Hi, Kathryn,
As a serials cataloger I think that mapping the data from the 246 and 247 fields is important as well as mapping the series statements from the 490 and 830 fields. Having all of this info would help with searching and easily identifying/disambiguating records. And I agree with Laura–we should definitely consider including info from some 7xx fields and that mapping dates from the 008/07-14 would be best.


#4

HI Kathryn,
I agree that the 7xx fields (particularly the 700 added entry personal name should be traced in the Codex. The difference between the 100 and the 700 is a legacy from the days of the card catalog I believe. At times the difference between an author being placed in the 100 and the 700 is as simple as the order they are listed on the title page.
I think reaching out to experts in other formats is a good idea. In addition to maps, music, and a/v, I would like to reach out to some of the DCRM community (the DCRM stands for Descriptive Cataloging of Rare Materials).
Thanks,
Sarah


#5

Thanks for this input, Natascha…

I’m going to add these fields to the spreadsheet I’m maintaining (https://docs.google.com/spreadsheets/d/1yVc-BYtM5eQF1zIooeP3b7JP2wr5_ztYsPIHwV3WBfg/edit?usp=sharing) so that all of the fields that we need to consider are in one spot, and I’ll discuss with the architecture team in our call later today so that we’ll all prepared for tomorrow’s MM SIG conversations!


#6

Likewise, thanks to you, Sarah, for this input…I’ll add these comments to the spreadsheet (https://docs.google.com/spreadsheets/d/1yVc-BYtM5eQF1zIooeP3b7JP2wr5_ztYsPIHwV3WBfg/edit?usp=sharing), as well!

I’ll also see if we can reserve a few minutes with the group tomorrow to talk about identifying and involving some experts in other formats. That’s real need and will require some special focus.


#7

Hi again, @LauraW

As I was updating the spreadsheet, I realized I needed some additional input from you (and others who have indicated that the 7XX fields should be included in the Codex).

Currently, the 700, 710, 711, and 720 are included in the Codex, but mapped to Contributor – is that correct? (Actually, as I re-read @sarahlschmidt’s comments, I believe she’s indicating that they would most likely be mapped to Creator, since the 700 field is often populated with second+ authors, etc.).

Thanks!


#8

I would like to know what others think (@natascha ? @sarahlschmidt ?). I think that since 700, 710, and 711 (and I assume 720 also, though my institution doesn’t use it) may contain either Creator or Contributor names, it would make more sense to have a single (repeatable) Codex category for Creator/Contributor.

Can anyone think of an example where this would cause problems/confusion?


#9

Laura, I can’t think of any reason why Creator/Contributor shouldn’t be combined into one category for the purposes of the Codex.

Kathryn, I was just taking a peek at the spreadsheet and I thought it might be worth clarifying the various date info that is found in serial MARC records. The 008/07-14 dates are actually drawn from the 362 field(s) whenever possible. Generally, the dates in the 362 field apply to the chronological designation of publication and not the actual publication date. For example, an annual report might have a chronological designation of 2014 with a publication date of 2015. However, if there is no chronological info then publication date is used. So, sometimes the dates in the 008-07/17 correspond to the dates in the 260/264 fields (if there are any) but sometimes they don’t. Anyway, this is just reiterating that we should map “publication” dates from the 008/07-14.


#10

A combined Creator/COntributor codex field seems logical, but results in other questions/considerations.

How do we map/display the fields in a way that ‘makes sense’ - do we map the 1XX field (when it exists) to the first Creator/Contributor field? Does that get a label other than “Creator/Contributor”? Or is it displayed differently than the other Creator/Contributors?
If we don’t distinguish 1xx mapped Cr/Co field from 7XX Cr/Co fields, we could create an interpretation/parsing problem - I’m thinking here of records for video specifically, where the list of contributors/actors/directors/etc. etc. can get quite long.
Relator terms (when they exist) might help with parsing, but there are lots of legacy records that don’t have that data.

my 2cents!


#11

Hey All - apologies if I’m way out of place here, I’d like to contribute, but I’ve not found a gap in the conversation that felt right - but at some point my technical discomfort has outweighed my social anxiety and this is the result :wink:

When we did the work on the eBooks pilot in GOKb (Some of that thinking made it into the cabalog thinking that was at least a tributary precursor to some of the codex ideas) we had a pretty strong requirement for creator data (And subject data, and a few other reference properties).

I’m reading this thread alongside the linked data thread also, and the thing that I’d like to throw out there is that as a storage (Codex?) model Person (Beer, Stafford) -> Role (Author) <- Resource (“Platform for Change”) — In effect formally separating out Work, Person and the Work-Person relation/role — might make much more sense than work.creator -> Author. This is especially true where we might want creators to play other roles, but be able to walk semantic relationships. I’m thinking particularly here with cataloguing and managing datasets for researchers so they can cite those datasets in published works. Similar problems appear in article processing charge handling systems for OA publishing, and in Reserves/other course materials handling.

I guess my worry is that as soon as you start to see the related items (Authors, Subjects, Editors, Publishers, etc) in the context of satellite “apps” (FOLIO speak) like APC Handling, Dataset cataloging, Course Reserves, Current Awareness Services, etc you start to need to treat people/subjects as first class citizens that can have their own relationships outside the catalog of works/instances/items.

My worry about this thread is that it seems very heavy (And to accept as a given) a traditional bib perspective. I spoke with seb (H) the other day and I felt there was a real miscommunication about how problems like this are perceived in bibframe too - and the root of that miscommunication was the conflation of a specific serialisation of the model with the model itself. I’m feeling like those assumptions might be baked in here also.

It strikes me that discussing the marc mapping in this way, we might be skipping a whole load of detailed domain mapping that might bite us later on. Apols if I’m speaking out of turn here - it’s not an easy thread to break into, and I’m substantially intimidated by the breadth of experience here - but I’m a bit wary of how this might play out in terms of the final built system, which is why I’m sticking my head up now.

Apols if this is all taken care of in the thinking already, really I just wanted to find an in to the conversation.

Cheers,
e


#12

Hi Ian-
I appreciate you speaking up! Speaking for myself only :wink: I do have a bit (a lot) of legacy, cataloger, tunnel-vision, so it is definitely helpful to have folks outside the catalog-centric world give us a different perspective.

I’m in agreement that breaking out the Work / Person / Work-person relationship data points is going to serve us best in the long term.

I think what we’re wrestling with is the iceberg of legacy bibliographic data where the relationship/role is not explicitly stated, but inferred by assigning a name entry the 1xx MARC tag or 7xx tag - which is, I think, what leads us to the “Creator” / Contributor data bucket(s).

@vbar - would it make more sense for the Codex to have a “Name” data bucket - with role attributes that can be very granular if they are specified in the source record (think MARC 1xx/7xx subfield e or subfield 4) or very general (creator, contributor, publisher) if they are not specified in the source record?

This means more work on the mapping side of things - and would the work be worth it, if the Codex is really for internal FOLIO functions, and not exposed outside of FOLIO?


#13

Ian, definitely a tangent to this discussion, so apologies. I wanted to pick up on an implication in what you posted about using semantic tagging to help with work/person/work-person relationships. In your thoughts, you talk about the advantages of separating out work/person/work-person relationships, and use datasets as an example that would benefit from this approach.

Here’s the tangent…it would be helpful that for objects such as datasets, perhaps other types too, to have a ID minting facility in FOLIO that could be marshaled from relevant workflows such as discussed here in mapping incoming source records to the Codex. The ID Mint would allow an operator to generate and register, with appropriate management infrastructure, a unique identifier to associate with an object. This would allow a service from the library to the campus for minting DOIs or other domain identifiers and managing these for researchers and other users. The ID Mint would be a separate module (perhaps several with some coordinating module that manages UX) that generates the appropriate object identifier and assigns it to the object. Useful for datasets, local repo objects, or other sorts of unique local content.

#folio-future


#14

Interesting – this tangent is related to the Sharable Local Name Authorities project that is coming to a close with the drafting of its report this summer. I wrote about this project and how it might intersect with FOLIO last year. This is a more general case for the specific local name authorities issue (minting identifiers for all sorts of things), with the added twist that they somehow be published and discoverable by others that might want to link to them as part of their own linked data stores.

This probably isn’t a version-1 deliverable, though, so I’m tempted to tag this as a ‘#folio-future’ idea.


#15

Peter…tag away!

Good point you make, and one i meant to add is that FOLIO could have a module that provides a public interface for resolving locally minted identifiers. External interfaces may require a specific implementation to respond to resolution requests from the network, but could be built on a general capability for lightweight discovery support.


#16

Yep, to tie this thread back to others, I also think that this comes down to doing a little more rich domain modelling within FOLIO itself -

  1. Because I’d like to see if it’s feasible to come up with any standards that let us use hashing as a way to create identifiers rather than just minting another opaque ID - It seems really attractive to me to be able to say something like check index for HASH(PERSON+SURNAME+INITIAL+DISCRIMINATOR). I can imagine that we might need several variant “Standard” hashes (for different combinations).

  2. Related - I’ve been pondering the idea of using some kind of deferred name resolution for the reference data - Essentially the ability to go RESOURCE -> REFDATA_OCCURRENCE <- CONTROLLED REFDATA. Although I know this looks just like a standard relational M:N or a semantic blank node, the approach taken in Jisc MONITOR was to use the relationship node to carry the refdata as it appeared in the source record. So if the author name “Beer, s” appears in a record, that is held in REFDATA_OCCURRENCE. We then later on do the more complex job of resolving that into our controlled vocab. This allows us to both store exactly what what was in the source record, but also connect it to authority data. The down side is that you then have 2 places to choose from, although one would hope that the authority data wins. The other good thing about this is that it gives us a pivot point to correct errors when our name resolution jobs get the result wrong.

Certainly seems that shared name authorities are a core part of this tho - along with mechanisms for propagating and sharing the contents of those authority files (Which turns out to be more tricky than the source process of minting new identifiers)

Be good to get some concrete ideas on how we explore some of these issues in detail.


#17

I completely understand the issue that Ian is raising about whether to “treat People as first class citizens” within the Folio model and furthermore how to represent and manage the semantic relationships. I agree that providing support for an abstract model that represents complex relationships between various entities is not in the current Codex scope. The focus for Codex has been around supporting Resources (instances and items). The concept of a Work is on the periphery of the discussion, but out of scope for v1. That Work object is meant to introduce the notion of relationships as they relate to resources.

The Codex, in contrast to the previously discussed Kabalog, is a higher level normalization layer that seeks to simplify and flatten data structures, including any complex semantic relationships. So I would imagine that there remains the opportunity to create a separate domain to define and manage those semantic relationships. That domain could then inform other domains such as Codex or Inventory.


#18

@ianibbo & @Lynn_W
You’re right, many of us here have a very MARC-centric perspective (well, at least I do) and while it’s important that FOLIO be able to work with MARC we definitely need to think beyond that.

I’m trying to make my way through the new IFLA LRM (Library Reference Modek–i.e., the successor to FRBR) and I really like the idea of having an even broader entity than either you or I have suggested, something like “Agent” or “Name” (i.e., some term that encompasses not just individual persons but corporate bodies, or really any entity capable of creating or contributing to a resource) and a “Role” that could be as generic as “contributor” or as specific as “director” or “actor” or “editor” or “illustrator.” So, if there are granular relationships coded in the source record, they have a place to be represented in the Codex, and if not, there is still a place for the data to live.


#19

Hey Vince!

Ah you’re absolutely right to raise this - and it came up when Marc J and I discussed this. I think where Marc and I left this we managed to confuse ourselves a little about the exact boundary between the codex and more expressive domain model. It would be incredibly helpful for me to try and understand the service boundary. I know that might mean having a conversation at a more concrete level, but I think it would help.

We’ve got some code that takes marc records (A big batch) which MW mediated for us from Chicago. It would be really useful to understand how we see the flow of onboarding a recordset through some command interface and the FOLIO pathways.

I suddenly realise I’m exposing the fact I perhaps really don’t understand how people see these things fitting together - if anyone can suggest ideas I’d really appreciate it.


#20

Hi @ianibbo,
@Kathryn has started a new topic re. collection of use cases to be discussed at today’s MM SIG meeting, e.g. see @fhemme’s post about Union Catalog records