Based on the discussion from the Metadata Management SIG meeting on February 9th, this is a topic for discussing the Knowledge Base and Metadata - Scope and Domain document from @marcjohnson. There is a similar thread of discussion in the Resource Management SIG.
Relative to the stated initial goal of “Establish the groundwork for support of data formats beyond MARC … ,“ information that I have read or heard about FOLIO seems to state that any external record, such as MARC21 or DC or Bibframe, would be translated into a FOLIO-specific data storage format. Is that correct?
Perhaps I’m being premature, but I look at the stated goals, outcomes, and deliverables in the context of existing record conversion, especially bibliographic/holdings/item records. In that context, I find the “physical item” depending from the “entitlement” to be problematic. It implies to me that at the point of conversion we’d need to provide data on how we’re entitled to a given physical item; we don’t have such data for hundreds of thousands of our physical items because they were bought while we were using a previous system and the purchase data didn’t come with them into the current environment. Also, I would see Location connected to Physical Item rather than Entitlement.
On the other hand, I am somewhat reassured by the stated intent of maintaining the storage format of the original records because I don’t know what we’d do with the mountain of management metadata in our existing holdings records, as an example. I have an image of them being converted piece by piece as standards develop.
The typical workflow at this library (Texas A&M) is to create or update a bibliographic record on OCLC, then export it into the local Voyager catalog. Could you describe how that workflow relates to your use of instance, both internal and external? Would the record on OCLC be the “external instance” and the copy of the record within FOLIO the “internal instance”? On the model, the “derived from” arrow goes from the internal instance to the external. Shouldn’t there be two arrows, one in each direction? The answer, I guess, depends on whether I have correctly understood the definitions of external and internal instance. If I have, our library and I imagine many others derive records from the external instance and load into internal and almost never derive in the opposite direction.
Could you please explain “reference cataloging.”
Could you give some more detail about the relationship among the elements connected by the diamond, i.e. instance, package, platform, etc.? I think of packages as typically residing on platforms, for example, so I’d be interested in how you see the relationship.
Anne L. Highsmith
Texas A&M University
I would also be interested to get clarification on the FOLIO specific data format. This is also referred to in the spreadsheet FOLIO 2018 release (point 5. of Metadata Management tab). My initial reaction is: why create yet another metadata format? It seems to me that it would make more sense to facilitate conversion between existing metadata formats rather than producing another one… So I am wondering: is that because there is no desire to build FOLIO on Marc, but there is nothing else as yet that can be used? I would certainly like to know more about this proposed format.
In terms of the conceptual domain model, I would like to have clarification on what is meant my Instance - is this in a Bibframe sense or something else?
Related to that, if external/internal instances are sets of metadata either external to the institution or internal, I find the use of the word Instance in this context confusing: surely those sets of metadata would be more than relating to Instances only, but again it all depends what is meant here. If we also have Works, which would presumably be created as part of the FOLIO metadata format (?), then the external/internal metadata sets encompass more than Instances?
I am fine with entitlement deriving from location, but I would have location relating to both the physical item and, if relevant, the electronic item but how that works depends on what is meant by instance (again). The relations with platform/package/subscription also needs to be clarified.
It seems that a lot of my questions at this point revolve around the notion of Instance!
Alexandra De Pretto
Data and Systems Librarian
National Library of Scotland
Would it be correct to say that external and internal instances can be added to several places in the diagram? Are the ones that are shown just one possibility? I can see the same structure of external/internal instances feeding into the Work data, the Patron data, and the Platform data.
Also, can multiple external instances contribute to a single instance in this sphere?
Following on from last week’s Metadata Management SIG meeting where many of these topics were discussed, here is my attempt to share and possible expand upon my (and maybe some of @Peter ’s too) thoughts in response to the questions raised.
I’ve tried to group the questions together, to try to reduce the chances of repeating myself, this may mean I miss some aspects. If I do, please do ask follow up questions or ask me to expand further. Similarly if any of my thoughts inspires new questions, please feel free to ask.
We’ve had a series of ongoing conversations about this aspect and we aren’t intending to attempt creation of a new metadata format with richness akin to MARC. We believe there is a need to be able to support a variety of formats, including MARC, as first class citizens, and be able to adopt new ones as they are created, which likely means making the system less centred on one specific format.
The goal we have in mind for FOLIO when we talk about it having it’s own format, is for a mechanism that allows the various parts (modules) in the system to receive the information they need to do their work effectively in the simplest way possible. Whilst, at the same time making the richness of specific format like MARC available to those that need it (and can interpret it). Tuning what this balance is, so that the majority of modules aren’t overly concerned with the specific underlying format, yet without unintentionally creating a new complex format, will need to be an ongoing endeavour of all of FOLIO’s work in this area.
When this translation between the rich metadata formats and the simpler internal formats, used between modules, occurs is yet to be decided, our goal is to allow parts of the system to decide this later on. There are a number of scenarios depending upon a variety of factors including the source of the metadata and whether it will need to be edited. We want to describe these scenarios as early as possible so that the model we design can accommodate them, without major changes in the future if possible.
As an example, the circulation part of the system may only need to understand basic descriptive metadata. An internal, specific format for this allows this to be interchangeable independently of the bibliographic metadata editing part of the system. Distinct metadata formats may have their own editors which are also responsible for the translation into these smaller, internal formats. We have used Dublin Core as an inspiration of the form and attributes this internal format might express.
Another goal of FOLIO is to be able to support a wide variety of resources, for example, rare collections, maps or scientific data sets, some of which may be better described by alternative formats to bibliographic metadata standards like MARC.
Alongside these goals, it is crucial to encouraging adoption that we ease migration of the vast quantities of high quality metadata that many organisations already have. With this in mind, we intend for FOLIO to store the original (source) records, meaning that these can be used for reference and could support incremental migration as FOLIO’s ability to understand them improves.
This means that for imported records, we envisage there being the source record (in the format they were when they were imported), the “truthful” system of record representation (being the rich metadata, likely in the same formatted as when imported) and the internal, more focused, formats used for modules to communicate.
The terminology in this area hasn’t settled down yet, we are especially keen on feedback on how useful these terms are at describing these ideas.
I’ve used entitlement as the name to describe aspects of the commonality between electronic and physical items owned by a library, for example, patrons are “entitled” to read a print book (due to the library owning a copy) or an electronic book (due to an access agreement the library has). This is not intended to describe how the library acquired this entitlement or the details of licensing. The arrows that go between electronic entitlement and physical item to entitlement are intended to show this commonality, rather than it being a distinct concept in its own right.
A part of the confusion here, may be due to my search for a diagramming convention that is broadly understood across a wide range of audiences.
For the purposes of the model, I used location as an example of a common attribute of the resources an organisation is entitled to, in the sense that both electronic and physical items have locations. It may be that as these definitions evolve this may not be a good example and it may be replaced by other attributes.
There is a newer, alternative diagram that is being worked on which expresses the same ideas in a different manner, which may help to describe some of these aspects better. We are looking to evolve a shared understanding of this domain and so feedback is crucial for refining these models.
Instance is meant in the BIBFRAME sense of the word. For example, the hardback 1st edition of Harry Potter and the Philosopher’s Stone is an instance and an item represents the physical copy of it.
The terms external and internal may be causing confusion, I’ve experimented with some alternatives and haven’t found a pair that expresses these ideas any better, any suggestions for better names are more than welcome
The key distinction between internal and external instances is whether the organisation owns or curates that instance, rather than where the record is stored (which we may need to model in another way). This kind of ownership distinction could apply to other concepts, of which work, subject and people could be candidates. The diagram did not attempt to describe those as this was intended to be small toe in the water of whether this distinction makes sense and we wanted to keep the diagram relatively constrained.
It might be possible for an internal instance to have multiple external instances it is derived from, I’d be interested to hear about examples of that? It is likely that this would add significant challenge to determining which instance took precedence for each attribute. An alternative could be for there to be a chain of derived instances, each deriving from the former.
The example from Texas A&M could be useful to work through, I believe the model could accommodate that in a number of different ways, depending upon the details of ownership, what role FOLIO took in relation to these systems and how explicit the relationship is between them. One of those could be that the OCLC record was modelled as an external instance and the Voyager record an internal instance.
“Reference cataloging” is the idea that the system remembers the external source of the original metadata and can potentially track changes and divergences in that information to allow for choices to be made about adopting those changes. The goal of this is to try to reduce the effort involved in keeping these records in sync and allow an organisation’s catalogers to focus more on the aspects of descriptive metadata that are most of interest to their organisation. The crux of this, in the model, is the arrow from internal instance to external instance, which is intended to describe how it is an internal instance’s responsibility to remember the external instance it came from (it could potentially be in both directions, that would likely depend on the system managing the external instance).
Combined with the relationship between a work and an instance, these two relationships represent some of the Linked Data aspects of this model, where attributes can flow (“trickle down”) from the less resource or organisation specific representations (e.g. Work or External Instance) to the more specific.
It is yet to be determined the specific mechanisms for how this flow could happens, it is likely to involve a layering of attributes, as Lynn suggested in the call, so that we don’t need to take entire copies of the attributes in each step.
That diamond describes that same relationship as you refer to, of instances of electronic resources provided on a platform being part of a package, with a small difference (inspired by GOKb), allowing for the potential that some packages may have instances on different platforms.
Some examples of external sources that could contribute to a work instance could be a bibliographic source, a table of contents source, and a cover image source. I also wonder if external authority sources like VIAF would be considered here as well. I may be trying to jam too many things into what should constitute a bibliographic instance. What do you think?
Hi Lisa (@lmccoll_lyu)
I think that is an interesting idea that is worth considering in more detail. I don’t know how often the metadata that an organisation has about a resource is actually an amalgamation from multiple places?
Important aspects of this will be if those sources need to be actively monitored or references (they could just be used to initially create the metadata), and if the information in them may overlap (which is likely to make it harder to automate handling changes).