Skip FOLIO Project Navigation

Version of Unicode supported by FOLIO?

Up to what version of the Unicode Standard does FOLIO currently support? Is it supporting scripts such as Adlam, Yezidi, and Hanifi Rohingya?

Best,
Charles Riley

Hell Charles,

I am not familiar with such variation of the UNICODE. With Arabic Unicode, the standard Arabic code page is so far sufficient enough.

Thanks,
Massoud.

1 Like

Thanks Massoud,

Unicode is a versioned standard, and updates are made about every 1 to 2 years toward more complete representation of the world’s scripts and languages. It is up to Unicode 13.0 now. Newer operating systems are handling the more recently added scripts, including Adlam, but individual software applications sometimes lag behind in their implementation.

Best,
Charles

This is an interesting question with a couple of nuances, I expect. On the one hand, the user interface is all in a web browser and the interactions with the server are JSON files passed in HTTP. I would expect all of that to be up to the latest version of Unicode. On the back end, almost all of the modules are using Java (Java 8 primarily but moving to Java 11, I’m told). The data is stored in PostgreSQL version 8 in the hosted reference implementation, although there are some sites that are using version 11 and version 12. To the best of my knowledge, all of those components treat handle whatever the latest version of Unicode is.

Where I wonder if there might be differences is in things like sorting order. For that we would probably have to look at the various routines that are performing the sorting—most likely using a common library in RAML Module Builder. As far as display goes, though, I would expect any of the scripts to display fine. Feel free to try out one of the hosted reference environments and post back what you find.

Thanks! I think even on Java 8, the ICU4J software library can be helpful. I’d like to check out the reference environments, but am not sure how to obtain a login.

Charles

Hey Charles. Take a look at the links under the “Demo Sites” heading on wiki.folio.org. I would recommend the “Current Release” one to ensure you have a stable environment.

Hi Peter. I was able to test a record containing valid Adlam characters from Unicode 9.0. The data included “𞤫𞤬𞤼𞤫𞤪𞤫 𞤨𞤢𞤴𞤳𞤮𞤴”, romanized as “Deftere paykoy”. It was rendered as “�������������� ������������” on import into the FOLIO Goldenrod release. I would have liked to see at least empty boxes, indicating lossless conversion, but what was output was lossy.

Best,
Charles

It would be good to have an issue in the project tracker (issues.folio.org) for this, Charles. I’m willing to add it on your behalf if you want me to. This web forum tool is somewhat limiting in the file types it accepts, so you can send the MARC file to me at peter@indexdata.com. Also, describe the sequence of events you used to import the record so we can reproduce it exactly.

Thanks for testing this out.

A sample title from the Yezidi script, enabled in Unicode 13.0, is “𐺋𐺣𐺗𐺀𐺩𐺋 𐺀𐺩𐺏𐺀𐺨𐺀𐺢” to test for support of. Hanifi Rohingya was included in the 12.0 release, and a test string for support of its characters is “𐴌𐴟𐴗𐴝𐴙𐴣𐴒 𐴧𐴙𐴝”.

Best,
Charles

Thanks for the details, Charles. I’ve created UXPROD-2685 with the sample file to track this need.

Thanks a lot!

Charles

Hi Charles! Could you look at the comments on https://issues.folio.org/browse/MODDATAIMP-332 and see if the issue is addressed?