Use cases for batch operations
One set of related use cases that are important to the workflows at our consortium is adding, deleting, and updating e-resource records in batches; batch sizes vary from one record to more than 100,000. We obtain these records either directly from vendors or indirectly through OCLC; the former are generally added in a manual workflow (performed by a systems librarian, not a cataloger), while the latter are done in automated overnight processes.
I’ll first describe how we do things now, then – in later replies – attempt to map that onto FOLIO in two different ways: (1) a vendor-neutral approach in which access to an e-resource from multiple vendors or on multiple platforms is represented by a single Instance record with multiple attached holdings; and (2) a simpler(?) approach using multiple Instances.
Background
Our ILS is Voyager, which makes a clear distinction between holdings (MARC-format holdings records, a.k.a. MFHDs) and items (the things that can have barcodes and be checked out). A bib record can have any number of holdings records, and a holdings record can have any number of items.
Since I’ll just be talking about e-resources, and since we don’t create items for e-resources, my notes will be limited to bib records and holdings records.
How records are loaded now
We take a vendor-neutral approach as much as possible, but keep print resources separate from e-resources. (I won’t get into how we accomplish this as it’s not particularly relevant.)
We use OCLC WorldShare Collection Manager to manage a number of large e-resource collections that we (the Fenway Libraries Online consortium office) load on behalf of our member libraries. I’ll use Ebrary Academic Complete and JSTOR’s DDA collection as examples, because we’ve found there is some overlap and because that overlap introduces some complications.
Every night, an automated process obtains any new files of new, deleted, or updated records from OCLC’s FTP site. A script then prepares the records for loading by adding or removing fields, identifying any matching records in our catalog using ISBNs, LCCNs, and OCLC numbers (035 subfield $a in the MARC record), and synthesizing a field that our ILS’s batch-load utility then uses as the sole matchpoint. This gives us much-needed control over how records are loaded.
New records are matched against our catalog. If a match is found, a new holding record is added to the existing bib; otherwise, a new bib record is created and a new holding record is added to it.
Deletions work similarly, except holding records are deleted instead of being added; if the last holding record on a bib is deleted, our ILS automatically deletes the bib record. Of course, we don’t want to delete (for example) a JSTOR DDA holding record if what we’re processing is a batch of Ebrary Academic Complete deletions, but our ILS is able to distinguish different holding records based on the location code in the 852 field – only a holdings record with the appropriate code is deleted.
Updates are trickier. As with new records, the first step is to identify matching bib records in our catalog using ISBNs, etc. We can’t blithely update (i.e., replace) the matching bib records, however, because there’s no certainty that the matching bib record came from OCLC in the first place; instead, the record prep code performs a secondary “unmatching” step that eliminates (i.e., ignores) existing bib records that were not created as part of a new record load for the same collection – e.g., we don’t replace a bib record when loading Ebrary Academic Complete updates from OCLC unless the bib record was itself created in an earlier batch of new Ebrary Academic Complete records from OCLC.
We also occasionally need to delete, or otherwise manipulate, all of the records for a particular collection – for example, if we drop an e-book collection. This requires identifying all of the records, which we do in an ILS-specific way that I won’t describe, but we would want FOLIO to have an analogous mechanism.