Codex Metadata Analysis - When should a new instance be created...?

codex

#1

@Kathryn - In your Codex Metadata Analysis slides, on slide 32, you ask, "When should a new instance be created and how will FOLIO know?

Is the desire to have FOLIO be able to detect these without human intervention? I almost want to assume that is the intention, but it would be great to have that confirmed before jumping into your slides on the ISBN/Instance discussion.

Thank you!


#2

Several different thoughts on this:

  1. I feel like the answer to when a new instance is created hinges on the precise definition of “instance.” I have a good sense of what an instance is, but still feel like I need a more proscribed definition.
  2. Given the proliferation of multiple ISBNs in current MARC records, I am very leery of the idea of ISBN driving the creation of instance records.
  3. In general, I am very much in favor of as much granularity as possible. However, if two different sources of information (say a KB record vs a local MARC record) actually describe the same thing, I don’t think there should be two different instance records. Which brings me back to point #1.
  4. This is a bit off topic, but I think it would be important to allow individual institutions to configure the codex to display instance records separately or lump them together (using locally defined criteria).

Would it be safe to look at the BibFrame definition of “instance” to help with question #1?


#3

I really appreciate reading your thoughts on this @LauraW!

  • When a new instance should be created might be the same question as defining an instance. The outcome is the same, I think, as what you are asking for Laura: a good definition as to what makes a FOLIO instance. Case in point: Slides 12 and 13 of @Kathryn’s presentation shows that we have no consensus as to whether the individual volumes within a multi-volume set should each have their own instance or be part of one. Something like this needs to be worked out before criteria can be set.

  • The ISBNs are very problematic. The slides that @Kathryn provided list the problems very well. I’m not sure these can be overcome. There are too many issues. ISBNs are a great identifier in theory, but we have not used in our cataloging records in a way that would allow us to make a one to one relationship of ISBN:Resource.

  • This is leading me to ask myself, maybe prematurely, once we do know exactly when a new instance should be created, what other data do we have that will identify a resource as unique? I can only think that it would have to be combination of data like author(s), title, publisher, publication date, and for print resources, last numbered page.

  • I think this last one may sound a little nutty, but I even wonder if the criteria were only title and last numbered page to identify a unique (print) instance, if the success rate would be the same as if author(s), title, publisher, date were all used together. Both the title field and the use of the last numbered page are very consistently applied by catalogers. This “formula” could not extend even as far as ebooks, since pagination interpretation is not consistent for those. Again, this is probably putting the cart before the horse. If we know for sure when a new instance should be created, we can try to work on how.


#4

Jumping in a little late here, but hopefully can shed some light (or at least share my opinion/understanding) on the conversation so far…

This is the easy question, in my opinion…because I think that it has to be done programmatically rather than through human intervention given the scale of data we’re talking about. Having someone review and make decisions about the creation of an instance during migration, for example, is prohibitive.

As I thought about these issues, my concern about FOLIO “knowing” was always in the context of "Does FOLIO have enough information to make a determination as to 1) when to create an instance, 2) which instance(s) are created, and 3) how the system knows where to “hook” holdings/item data.


#5

The BIBFRAME Instance definition is a great starting point and was very much in the forefront of folks’ minds as they developed the model. We can discuss this more in an upcoming workshop meeting, but if you use the BIBFRAME definition, you’re well aligned with the way the architecture team views an Instance.


#6

It would be great to discuss this further. I’ve been looking at the BIBFRAME vocabulary. The short definition of instance is “a material embodiment of a work.” Slightly longer: "Instance. A Work may have one or more individual, material embodiments, for example, a particular published form. These are Instances of the Work. An Instance reflects information such as its publisher, place and date of publication, and format. [emphasis mine]

So thinking about e-books (this may be jumping ahead a bit), would a different provider/platform require a different instance? I think this would require the 856 $u to be in the codex.


#7

The simple answer to the original question is that a new Codex instance record is created when a new Source Record is added to the system. For example loading a new MARC record into Inventory will trigger the creation of a new instance record in the Codex,

The source record may be a MARC record or a record of a different format (e.g. LD or BIBFRAME). A single MARC record may introduce, for example, multiple identifiers (e.g. multiple ISBNs for a monograph). There is no case under which the Codex would attempt to split that into two instance records. The Codex must defer to the original source record on a 1-to-1 basis. If the source record has multiple identifiers, then those will be picked up and the Codex instance record will contain those same multiple identifiers.

The question probably originates with a statement I made the other day regarding how different ISBNs mean different Codex Instance records. What I meant by that, was that different MARC records will introduce different ISBNs and thus create a new Codex instance record for each MARC record. Implicitly, the point was that there would be no attempt to consolidate ISBN records that might represent the same Work, into a single Codex instance. I certainly did not intend to imply that multiple ISBNs in a single MARC record would cause a split into multiple Codex instances.


#8

FWIW here link to a guideline for inputting new records into WorldCat When to Input a New Record written by OCLC and its advisory groups


#9

HI Vince- a one to one equivalency is helpful up to a point. In the case LauraW describes, having two separate instance records might be more confusing/complicating than helpful. Could we see a way that 2 different source records could be linked to a single instance record? Or maybe this could be solved through the search display functionality?

A specific use case I’m thinking of is where we have purchased an ebook & have a marc record in the catalog. Then that same ebook becomes available through one (or more) of the ebook packages we purchase - we currently get Marc records for these via Serials Solutions, but I could see a future of us just pointing to the KB or importing KB source records for those…


#10

Vince will likely jump in here, as well, but my understanding is that each source will “contribute” a unique instance record to the Codex and leave the resolution/display to business logic “on top of” the Codex. While we’ll certainly have to discuss and determine that business logic (i.e., when two instances should be “collapsed” into one), I believe that this is the best, most flexible way of handling this situation. Leave the data “pure” and manage its presentation to meet the needs of specific use cases.


#11

Yes, I consider the presentation (or UX) problem to be a separate one. Ultimately, if there are two source records, there will be two instance records at the data level. The primary goal of the Code instance record is to navigate/search and locate those source records. At the UI level there are opportunities to merge nearly-identical instance records at the presentation layer. When the Work becomes fully functional in the Codex it can be used to connect the two records explicitly and guide a deterministic roll-up by the UI in their presentation.


#12

I’m thinking that title and last numbered page is not quite enough. US and UK editions of the same novel or book of poetry come to mind. There may be variations in spelling within the text, but the title and number of pages may be the same. If the publisher varies though, I think we would want separate instances. So I think we would at least need to add publisher to the criteria. That of course means potentially some “fuzzy” or loose understanding of publisher, so that Norton, WW Norton, W.W. Norton, and W.W. Norton Inc. could all be recognized as the same publisher.


#13

Also, if separate source records drive separate Codex instances (which I’m fine with), and then some sort of matching/roll-up is used to drive the actual staff display/UX, then even if that roll-up is beyond the scope of the Codex, we need to have some discussion about that roll-up process: what options might be offered, whether there are defaults, whether it’s configurable by individual FOLIO libraries, what could or could not be accommodated in v1 versus future versions, etc.


#14

Some thoughts on when a new instance is minted, and how instances come together again (are rolled up) for administration or discovery. There are a few different ways to look at it… part of it is (perhaps) a migration question, part of it is a question of local policy, part of it might even be a perspective of what “we” as in the FOLIO community, would like to encourage, if not enforce.

First of all, Vince’s response is right – if you wish to add a MARC record with minimal change, then the most simple view of the Codex is that this record is viewed as a brand new Codex instance, and if you want more meat than what is visible through that, you (or your app) can refer to the source record. That is the most fundamental approach and it needs to be thus, I think, to allow people to import their catalog without having to make big and difficult decisions about refactoring their catalogs… at its simplest, the codex can naively represent a silo of MARC (or any other format) of records.

But suppose you wanted to take a step further and actually “refactor” your catalog, to move closer to a model that explicitly recognized works as groupings of instances. A simple way to do that would be, during your import process, to use something like this: https://www.loc.gov/bibframe/mtbf/ to map your MARC records to BIBFRAME and make those records the source. In that process, the mapping might generate Work entities which would make certain instances group together with a shared Work parent. In the first rollout of the Codex, the instances would still be separate because we haven’t made a space for Works yet, but the intent is definitely to do so… when that’s done, you’ll have a model for administratively or algorithmically clustering instances around a shared work within the system. Going forward, you would want your cataloging workflows to similarly honor this clustering… i.e. make it easy when a new Instance is created to link it to a Work… this is made complicated because there are of course many different processes by which instances are added to a catalog, and this completely ignores the electronic KB model where Someone Else is responsible for curating the instances.

The second answer to ‘roling up’ instances into larger units (works) is, it can be done dynamically by the Discovery layer even if the library is not administratively maintaining works structures. One big advantage of this approach is that the merging/clustering rules can easily be changed over time without impact to cataloging rules or the structure of the catalog. The rules can even be adapted to the audience and application. End-users might prefer much more inclusive/aggressive merging rules than professional librarians.

So if work-level clustering is potentially in the eye of the beholder, if it can be done well by a Discovery ‘crawl’ of your catalog, are there still advantages to administratively curating the work/instance relationships? I think there can be, and if BIBFRAME is the strongest candidate for an intersection point between library craft and the linked open web, then this would seem to support that idea also. If I want a link from the wikipedia page about a work, it makes more sense to make that link to/from a representation of the work than some random instance. Becoming more mindful about explicit, hard relationships between the different entities in our space (instances, work, people, subjects) is also one path to finally force ourselves to get identifiers right.

This is a long (sorry) way of saying that there’s more than one way to skin a cat in the Codex, and there will be many many more ways than one in FOLIO, where the Codex is just a hub for a particular range of approaches, but it doesn’t have to be the only hub.

As for what we do in FOLIO in the short term, I think we definitely need an implementation of the Codex that will let you ingest and work with a set of MARC-flavored data viewed through the Codex service interface, with access to full MARC for apps that want it. I tend to think we also owe it to ourselves to have an implementation that lets you switch to BIBFRAME if you want it, just because we would learn a lot from thinking about what it would look like.

Full disclosure… we at ID have been working with the LoC on the Marc/BIBFRAME crosswalk, and we created the reference implementation which can be downloaded from that same site. We already went through the exercise of playing with it as a microservice in FOLIO, albeit without a storage module based on the Codex, there hasn’t been much we could use it for.


#15

One thing to think about this… with FOLIO and the Codex there’s a whole huge swath of approaches that you could take when you look at the existing catalogs of people looking to migrate. From burn it at the door to, do no harm, make sure people can continue to maintain what they have built over decades.

Lately I’ve found myself in many conversations with organizations contemplating a jump to FOLIO in the mid-term. I think this tends to shift my attitude towards one of “how can we make the initial apps and models in FOLIO enable people to bring their bibliographic models with them”, rather than requiring them to check their stuff at the door. But at the same time, we really don’t want FOLIO to evolve into another MARC-bound system. We don’t want to perpetuate workflows that say more about limitations of past systems than what’s really, actually good practice. And I would say that nowhere in FOLIO is that balance more complicated to achieve than in bibliographic description. So I just want to give a shout-out to all the folks with deep domain experience who’re diving into these conversations… they’re incredibly important.


#16

To me, it’s important to always keep the overall metadata model in mind.

· Locally stored bibliographic data: could be a single data store of MARC formatted records, several local data stores of MARC formatted records, or local data stores using many various other formats.

· Plus external KB(s) data that is accessed from within FOLIO

· All rolls up into the Codex, a coherent, thin metadata layer that normalizes the data wherever its origin, for the various apps of FOLIO to use.

· The Codex data will be organized somehow, but the library staff will interact with it via the Codex UI, which can influence or act on the Codex data to represent it in a more unified or less complex way than it is actually stored underneath.

· Library staff will also be able to interact with the local MARC data stores, local non-MARC data stores, and external KBs.

· And all of this is just for the back office folks. This is the part I have to keep reminding myself over and over again.

· No matter what it looks like for the back office folks in the Codex, local inventories, and/or external KBs, what the library patron sees will be driven by the Discovery Layer.

· Always important to be keeping track of which piece we’re talking about when we start defining which/how much data, how editable, and what it will look like.

That’s my 2 cents for today.

A-M

Ann-Marie Breaux
Vice President, Workflow Services Product Management
EBSCO Information Services
10 Estes Street
Ipswich, MA 01938
abreaux@ebsco.commailto:abreaux@ebsco.com
Mobile: +1-678-427-4875