There has been a lot of concern about UUIDs as the primary identifier in certain circumstances. In the context of ongoing operations, there are a number of areas where people transact in identifiers in analog ways. The purpose of this thought piece is to call out a few types of identifiers that are in use in library systems, note some of their properties and what circumstances they are useful in, and to call for deliberate choices when choosing an identifier for any particular type of record based on the context in which it is used.
By way of context for this discussion, the ways in which libraries and their patrons and partners transact in identifiers is not to be minimized. For example, when people need to unambiguously reference a specific bibliographic record they will jot down that bib id or read it over the phone. This happens all the time, when troubleshooting, helping a patron, and similar circumstances. In the context of migration, there is a concern that we already have identifiers for many records and the relationships (in the RDBMS sense) between records are already encoded with these, translating to UUIDs is a potential source of error. Libraries also often use identifiers to in constructing persistent identifiers for bib records or to coordinate with external systems. For example, bib ids are used in persistent identifiers in the public catalog for patrons to have persistent links, for professors to make reading lists, etc. Changing these, even their format, has repercussions outside of the library.
There are three basic types of identifiers that are of interest:
These are alphanumeric codes, typically short in length, which are defined by hand. There may be some system defaults for a specific code and the implementing site may define more.
Useful cases are when there are a relatively small number of such things. Item status code, borrower types, location codes, donor codes often fit this description. The codes often serve as a valuable local shorthand, whether for operational reference or making SQL queries of the data. The display labels corresponding to the codes may change over time as the context of their display changes, but operational references to these codes are stable and staff learn them quickly.
Identifiers which start with a number and automatically increment as assigned. These are used almost universally for bibliographic records, for example. (Variations would include accession numbers that might begin with a year.) These allow unambiguous reference over time by humans and between systems.
See post 7 below by @peter ]
128-bit integers with a particular pattern of alphanumeric display. These are convenient from the software perspective as they can be generated without reference to existing storage, but are miserable for humans to transact in. From that perspective, they are most suitable for cases where humans do not use the identifier directly. For example, transactional records like loans. A UUID makes good sense here: the humans looking for the loan will typically be looking for the patron or for the item in the loan, often by barcode in either case, and I believe rarely if ever need to make direct analog references to the loan record itself.
So far UUIDs have been used as the primary identifiers. One appealing feature of the UUID is that it is Universally Unique. However, in order to resolve any UUID in FOLIO, one would need to know what type of object it refers to and which tenant it belongs to in order to query the correct storage module. (Or to query each by brute force.) Any message pointing to an object would need to carry enough contextual information to identify the address of the storage module where that object can be found. So the universally unique attribute would seem to be less important, and the main concern would be uniqueness within that storage module.
There has been talk about accommodating human-readable IDs as a secondary identifier, tracked in parallel with the UUIDs. It’s not clear that two parallel identifiers for record types that need something human-readable is a desirable. It’s a form of double-entry of data. And for those operations where the human-readable identifier is useful, APIs would either need to support operations based on both identifiers, or there would always be some lookup involved. In short, we would be talking about adding human-readable identifiers as an afterthought for those records where it is needed, and that would seem to introduce some unnecessary complexity, assuming that the previous conclusion about uniqueness within the storage module holds.
Where does this line of thinking lead?
My current view is that any three of these types of identifiers could be suitable for a given record type, given the manner in which they are used. A desirable approach might be, for any given record type, to determine the type of primary identifier based on the context in which that identifier will be used. This could come out of discussion between the developers and the relevant SIG. RMB might be extended to allow for the type of identifier to be specified by the developer, and that would be reflected in the relevant JSON schema.
I am happy to discuss and to have holes poked in the above.