Discuss.FOLIO.org is no longer used. This is a static snapshot of the website as of February 14, 2023.

Folio codex use cases

mikeshowalter
26 May '17

I created a draft use case spreadsheet here: Metadata use cases. My assumption was that they should be based on example workflow details that reflect real world, so to get started used a couple I found in the RM google docs folder.

Then, to get to the actual details for of the metadata for informing what might be required for the Folio codex, I made another spreadsheet for the first item and copied in the details from the record. That document is at https://docs.google.com/spreadsheets/d/16ywTWiQvTjHF3RbGpoZ00rAZ0gqf1F2upUTLWTPW0aY/edit

My working assumption was that we’d need to identify the general use case and then identify the details about the actual source record.

Feel free to provide feedback, add notes, etc on the format, or start filling in use cases of your own to see if the model fits.

–Mike

Charlotte_Whitt
29 May '17

Hi @mikeshowalter, a quick follow up question to the spread sheet for the required information for Codex record: ‘Sheet 1’ would that be one format fits all material types, or would you like us to add new tabs for each material type, e.g. printed books, e-resources, articles, journals (print and electronic), audiovisuals (music, film, etc.), printed music, maps, images etc., etc.

Best
Charlotte

mikeshowalter
29 May '17

We could add a column for material type rather than another tab.

Ann-Marie
31 May '17

Hi Mike,

I added some use cases - random ones that came to mind based on our discussion last week. I didn’t include specific titles, but someone else or I can dig some up as needed.

Thanks,
Ann-Marie

Ann-Marie
31 May '17

Hi Mike,

For the metadata/codex worksheet, I added some data elements plus some notes/questions. Renamed the first sheet as MARC data, since we’ll likely need other worksheets for other source data formats.

My viewpoint is books, so others will need to add info/data elements for other material types.

Thanks,
Ann-Marie

kmarti
1 Jun '17

Hi @mikeshowalter
I looked through the Metadata use cases spreadsheet and added some comments about potential issues and thoughts. I think I am still considering issues of workflow, and maybe not taking the leap to setting up the required elements for the FOLIO codex. I looked at the second spreadsheet with the specific example for Tito in tovariši, and I’m not clear if you want a different tab of elements for each use case, or if you want us to add a column for each use case to see how the model fits and if we need different elements? I haven’t made any changes to this spreadsheet, since I’m having some trouble approaching it.

Also, I’m trying to think about how some of the different acquisitions and ERM resource records would relate to the bibliographic records, if we are grouping them all together in the Codex. For example, if my library purchased the Project MUSE UPC 2016 ebook collection, we might be paying for that collection in a big bundle and under the current system, have to create a dummy bib for Project MUSE UPC 2016 ebook collection. Then we’d load the bib records for the individual titles, and might use a locally defined MARC field to associate them with the particular package. But the system wouldn’t have a direct connection between the individual titles and the dummy bib with payment info. Are we imagining a single overarching Codex record type that could both be the package record for the ebooks and represent each individual title? Maybe they would be differentiated by tags and then it would be up to the user to fill out the appropriate fields and there would be a mechanism to indicate relationships? Or would we rather have there be different record types within the Codex to represent the package compared with the individual title?

Within our current system, we use the bibliographic record for the descriptive data, but there is a lot of administrative data that we store there as well, whether through bib statuses, statistical codes, and data coded within MARC 9xx fields. I think this administrative data will need to be considered for the Codex as well.

peter
1 Jun '17

I think it is going to be important for us to move from strings to identifiers as much as we can for FOLIO Codex fields. For instance, the international standard resource identifier could be ISBN, ISSN, ISCI, DOI, or a whole host of other possibilities. Rather than dumping undifferentiated strings into a field, the field value should stand alone as a unique identifier; e.g.: urn:ISBN:9789612318444 and urn:ISSN:1560-1560 and https://doi.org/10.5438/55E5-T5C0. We would want it to be the responsibility of the thing that is converting the source record into the codex record to be responsible for ensuring that the translated data has these identifiers. (The MARC converter would know how to deal with MARC fields, the MODS converter would know how to deal with MODS, and so forth.)

nkuitse
8 Jun '17

The Metadata Management SIG met today to begin a discussion of these use cases; I’ll add some of my own thoughts below, and maybe others will want to join in.

lmccoll_lyu
8 Jun '17

Thank you for setting this up Paul.

nkuitse
8 Jun '17

Use cases for batch operations

One set of related use cases that are important to the workflows at our consortium is adding, deleting, and updating e-resource records in batches; batch sizes vary from one record to more than 100,000. We obtain these records either directly from vendors or indirectly through OCLC; the former are generally added in a manual workflow (performed by a systems librarian, not a cataloger), while the latter are done in automated overnight processes.

I’ll first describe how we do things now, then – in later replies – attempt to map that onto FOLIO in two different ways: (1) a vendor-neutral approach in which access to an e-resource from multiple vendors or on multiple platforms is represented by a single Instance record with multiple attached holdings; and (2) a simpler(?) approach using multiple Instances.

Background

Our ILS is Voyager, which makes a clear distinction between holdings (MARC-format holdings records, a.k.a. MFHDs) and items (the things that can have barcodes and be checked out). A bib record can have any number of holdings records, and a holdings record can have any number of items.

Since I’ll just be talking about e-resources, and since we don’t create items for e-resources, my notes will be limited to bib records and holdings records.

How records are loaded now

We take a vendor-neutral approach as much as possible, but keep print resources separate from e-resources. (I won’t get into how we accomplish this as it’s not particularly relevant.)

We use OCLC WorldShare Collection Manager to manage a number of large e-resource collections that we (the Fenway Libraries Online consortium office) load on behalf of our member libraries. I’ll use Ebrary Academic Complete and JSTOR’s DDA collection as examples, because we’ve found there is some overlap and because that overlap introduces some complications.

Every night, an automated process obtains any new files of new, deleted, or updated records from OCLC’s FTP site. A script then prepares the records for loading by adding or removing fields, identifying any matching records in our catalog using ISBNs, LCCNs, and OCLC numbers (035 subfield $a in the MARC record), and synthesizing a field that our ILS’s batch-load utility then uses as the sole matchpoint. This gives us much-needed control over how records are loaded.

New records are matched against our catalog. If a match is found, a new holding record is added to the existing bib; otherwise, a new bib record is created and a new holding record is added to it.

Deletions work similarly, except holding records are deleted instead of being added; if the last holding record on a bib is deleted, our ILS automatically deletes the bib record. Of course, we don’t want to delete (for example) a JSTOR DDA holding record if what we’re processing is a batch of Ebrary Academic Complete deletions, but our ILS is able to distinguish different holding records based on the location code in the 852 field – only a holdings record with the appropriate code is deleted.

Updates are trickier. As with new records, the first step is to identify matching bib records in our catalog using ISBNs, etc. We can’t blithely update (i.e., replace) the matching bib records, however, because there’s no certainty that the matching bib record came from OCLC in the first place; instead, the record prep code performs a secondary “unmatching” step that eliminates (i.e., ignores) existing bib records that were not created as part of a new record load for the same collection – e.g., we don’t replace a bib record when loading Ebrary Academic Complete updates from OCLC unless the bib record was itself created in an earlier batch of new Ebrary Academic Complete records from OCLC.

We also occasionally need to delete, or otherwise manipulate, all of the records for a particular collection – for example, if we drop an e-book collection. This requires identifying all of the records, which we do in an ILS-specific way that I won’t describe, but we would want FOLIO to have an analogous mechanism.

nkuitse
8 Jun '17

Collection overlap

One thing that is important to us is the ability to suppress (in our OPAC) holdings for a DDA e-book when we have paid-for access via a license or subscription from another vendor. This saves our member libraries some money. We’ve managed to develop a process that automatically suppresses (for example) JSTOR DDA holdings when we add holdings for Ebrary Academic Complete on the same bib record. It’s not necessary for this to be fully automatic – it just happens that we’re able to make it so – but we would want FOLIO to support something like this.

The use case for this would be something like “Identify, and delete or suppress, DDA e-resources that duplicate non-DDA e-resources.” This could be accomplished either solely at the Instance level (suppress Instance record X if it’s for a DDA title and if we have another Instance record Y for the same title available through some other, non-DDA channel) or at the Holdings level (suppress Holding X if it’s DDA and if there’s a Holding Y on the same Instance that’s not DDA). The former would be more difficult to implement, it seems to me; I consider this a minor, but significant, argument for having a single Instance with separate Holding records when an e-resource is available on multiple platforms – but of course this also complicates things.

nkuitse
15 Jun '17

More thoughts and questions about batch operations

Some of this is a little far afield, but mostly it’s relevant to potential Codex use cases.

I’ll refer to records that we want to add to FOLIO (e.g., MARC records from a vendor) as incoming records.

Once they’re added to FOLIO, they become source records.

Are source records immutable, or can we change them? If we can, do we keep a copy of the record as it existed before the change? (Or, equivalently, do we remember what changed so that we can reconstruct it?)

An incoming record doesn’t change when it becomes a source record – even if it’s not byte-for-byte identical I would presume it should remain semantically equivalent, i.e., there’s a one-to-one mapping (as, for example, between ISO-2709 MARC21 records and MARCXML records). Because of this, you can (potentially) use a single piece of code to perform a particular operation on both – e.g., to make a particular change in a bunch of incoming records before loading them or to make the same change in a bunch of source records (assuming they can be changed).

Anyhow, there will also be (we presume) Codex instances, holdings, and items that do not have a source record, and supporting batch operations on those will be different – perhaps a different set of operations will be supported, but even if not, the implementation of an operation that affects incoming or source records will have to be different from the implementation (of the same operation) that affects instance/holding/item records.

To support matching, FOLIO Instances will presumably have to contain all the fields that are likely to be used as matchpoints – ISBN, ISSN, LCCN, OCLC number, etc.

Sorry for the brain dump – these things have been percolating for a while without ever seeming quite fully brewed! :slight_smile: