Internationalization birds-of-a-feather discussion


#1

One of the great things about the FOLIO project has been participation from across the world. One of the challenges to the FOLIO project is to make it possible to run the software in all parts of the world. To provide a focal point for this activity, I’m proposing the formation of an Internationalization special interest group. A draft proposal for engagement is in a Google doc and is open to editing and suggestions from anyone. The purpose of the Internationalization SIG (at the time I’m posting this message) is:

Review user interface design, data storage, and OKAPI platform design for issues related to internationalization of the user interface and underlying data storage. Internationalization is the process of designing a software application so that it can potentially be adapted to various languages and regions without engineering changes. Localization is the process of adapting internationalized software for a specific region or language by adding locale-specific components and translating text. The SIG ensures the FOLIO platform is ready for localization by developers and implementers, including translation of strings; localization of currency, date, address, and telephone number formats; UNICODE support and display of stored data (when supplied in multiple languages); tokenization and sorting of strings; and coordinates internationalization efforts between modules. Members of the SIG are also available to FOLIO developers and other SIGs for consultation on ensuring good internationalization practices are followed.

The Google doc has a proposed list of commitments for SIG participants and a place for you to add your name as someone interested in the SIG’s work. Please propose changes to the SIG document that you see are needed, or reply to this post with general comments. I’ll host a conference call about the proposed SIG on Wednesday, January 18th, 2017, at 9:00am Eastern Standard Time (New York, GMT-5:00). The conference call will be on WebEx (meeting URL), and the meeting number (or access code) is 645 005 608. (Global call-in numbers.)


#2

hi Peter. Right-fo-Left orientation, for supporting languages like Arabic, Urdu, and Hebrew, is an important issue to be enlisted among the topics covered in the FOLIO Internationalization document. Also, words stemming and/or morphological analysis to handle prefix/ infix/ postfix strings attached to words are critical issues to be treated in order to conduct proper searching/ retrieving / sorting of records in these languages. The later is best handled.by Solr FTR or ElasticSearch FTR if supprted by the FOLIO platform.


#3

Thanks for the additional example of what needs to be covered. I’ve added bi-directional text support to the purpose statement so we don’t lose track of it. The purpose statement mentions “tokenization and sorting of strings” now; are word stemming and morphological analysis important concepts to explicitly list, or does the existing purpose statement provide a way for us to keep track of these needs?


#4

If “tokenization” implies covering the process of stripping off words’ prefixes (like removing “الـ” from the word “الكتب” - which menas “The Books” - before indexing and before searching so that the remaining string “كتب” is the phrase to be indexed and thus to be searched for during searching) then that means word stemming is covered I guess.

Morphological Analysis is an advanced step used often during sophisticated Arabic searching and/or machine translation, to handle caes when words are infixed by letters that change it’s state from plural to singular. For example. the word “الكتاب” - which means “The Book” - includes the letter Alif “ا” inserted in before the last letter to form the word “كتاب”, to mean a single book.

We normally find the Full Text and Retrieval (FTR) engines the perfect place to embed our Arabic Text Handling Technology to execute stemming and morphological analysis processes on Arabic words and statements. With Arabic Koha ILS, for instance, recently empowered by ElasticSearch integration, we managed to handle Arabic words stemming nicely via ElasticSearch. Also, with Arabic DSpace IR integrated with Apache Solr, we managed to implement the same functionalities as well.

I would expect, hopefully, that the FOLIO platform will be readily inviting for FTR microservices, such as Solr and ElasticSearch, to integrate seamlessly with the Indexing components of Cataloging modules.

Thanks,
Massoud.


#5

Thank you, Massoud. I think the balancing act is between making the description of the purpose too long and making sure the SIG knows what it needs to do. With that in mind, I think the purpose paragraph has enough detail to cover this (“tokenization” may not be the right word, but it gets close). That is just my opinion, though – if there are clearer phrases that could be used to describe this need, please suggest them. Thanks for your interest in FOLIO, and I hope you are able to join the SIG.


#6

Hi Massoud,
Thanks for your input. I looked at the embedded links in your reply two days ago, and today I wanted to show your examples to my colleagues, but unfortunately I do not see the content now - the Arabic records in Koha. Is it possible to activate the examples, or send the links again.
Best Charlotte Whitt (Index Data)


#7

My Pleasure.

Arabic Koha 16.11 OPAC with Arabic-enabled ElasticSearch:
http://54.194.143.35/cgi-bin/koha/opac-main.pl

Arabic DSpace 5.1 with Arabic-enabled Solr:
http://repository.taibahu.edu.sa/

Arabic VuFind 3.2 with Arabic-enabled Solr:
http://arkoha.maktabat-online.net/vufind/

Arabic Koha 3.22 OPAC with Arabic-enabled Zebra:
http://arkoha.maktabat-online.net/

Thanks,
Massoud.


#8

We had the birds-of-a-feather meeting on WebEx today (thanks for joining @Charlotte_Whitt and @julianladisch), and this proposal looks to be in good shape. I’m going to forward it to the product council for one final review, then we can get started on the SIG’s work.


#9

Hi Peter.

Sorry I missed the meeting.

What do I need to do to be notified of any coming meeting beside subscribing to the SIG ?

I have already subscribed to be a member of the Internationalization birds-of-a-feather discussion SIG, both on a personal and company account, over a week ago.

Thanks,
Massoud.


#10

Hi, Massoud. Missing the meeting is not a problem. Your comments were added to the draft and there were no new issues that came up at the meeting. The first official SIG meeting will be announced here, so watch for that notification.


#11

Hello friends at the Int’l SIG,

I want to bring to your attention an issue I have faced with ILS multilanguage support, that is most of the time overlooked by ILS companies in their efforts to provide localization support for people in countries with multilingual needs.

ILS system policies most of the time are structured to work in a single language mode only. That is, in spite you find that multilanguage screen messages of ILSs are enabled to support multilanguage interfaces, policies are saved into the database with a single name/description column so when you translate its content to another language you end up seeing the translated text appearing in all interfaces including English interface.

For example, item types, libraries’ names/ locations and marc bibliographical and authority tags names can only be maintained in one language in Arabic Koha ILS (English or Arabic policies but not both), Most of the ILS systems I have dealt with in the past, including Unicorn (now Symphony) and Koha, face the same issue. When you view this title in marc view for example, or when you try to select a library next to the search box, you can see that you have to deal with these pieces of information in Arabic in spite that the screen orientation is in English.

I can recommend the way PKP OJS 3 has approached multi language system requirement. Also, Drupal 8 includes a great piece of work when dealing with multi language websites support.

I hope this note can be helpful.

Massoud.