Discuss.FOLIO.org is no longer used. This is a static snapshot of the website as of February 14, 2023.

Re-thinking CSV and Tabular Data Export

btravis
28 Mar '21

FOLIO provides CSV-formatted export of tabular data in several places throughout the platform (loans, requests, fees/fines, etc.). Unfortunately, CSV files are not always opened “correctly” by the most common software users have to open CSV files: Excel. There are two problems:

  1. The default delimiter for CSV in Excel can vary by locale (eg. “;” is the default delimiter in a number of non-English European locales).
  2. Excel assumes ANSI encoding for CSV files (UTF-8 encoded characters are not rendered properly)

I would like to propose that the project undertake a refactor of how CSV and tabular data export is handled in FOLIO to make it more robust. At a minimum, we should respect the locale-based delimiter or provide an option in the UI or settings to set the delimiter used when generating the file. We should also begin including the UTF-8 or universal BOM (byte-order marking) on all of our generated CSV files to ensure that the UTF-8 character encodings are rendered correctly in Excel.

Long-term, we should provide user-selectable, multi-format exports using libraries that can directly generate valid XLS/XLSX files and TSV files, in addition to standard CSVs. Thoughts?

peter
29 Mar '21

Someone did attempt to create a “standard” for comma-separated-value files (RFC4180). Presumably, that is well handled by Excel?

btravis
29 Mar '21

No, Excel does not handle CSVs according to that specification in all locales. As indicated, it assumes “;” as the delimiter in locales where the comma is used for decimal notation.

peter
29 Mar '21

sigh — thus perfectly demonstrating the problems with CSV files. Good luck, Brooks!

enettifee
29 Mar '21

I definitely support the long term plan here but I’m not sure where the dev resources would come from for this to happen anytime soon.

maura
29 Mar '21

I support both items. I can’t imagine that other projects haven’t run into this problem. How big would the feature be?

lari
30 Mar '21

A quick fix to make CSV export for Excel that will work in all regions is to add "sep=," as the first line of the file. So FOLIO could have two export routes. One clean CSV without that line and one targeted for Excel that includes it.

Ann-Marie
30 Mar '21

I get nervous about commas, or most punctuation marks. For example, if you’re exporting title data from the Instance, almost any punctuation mark will be in it - comma, period, semi-colon, colon, etc. Would it make sense to consider something like | as the separator? We also need to agree on what is used as a separator between multiple values in a repeatable field. I totally agree that whatever is decided for these delimited exports should have consistent formatting across all the various apps. And we need to ensure that the Data Export app’s MARC output is not confused with any of this other delimited export work.

Like Maura, it seems to me like this should be pretty common across many projects - dealing with simple delimited exports, and making them work properly across various date/time and number formatting.

Lloyd_Chittenden
30 Mar '21

I like to use a ~ as a data separator when I export MARC data, but it is not standard in Excel at all. You always have to open Excel and tell it what the separator is.

enettifee
1 Apr '21

Another perhaps related issue is one I’ve seen in Bugfest this round - if you have barcodes that start with zeroes (which it looks like Chicago does, @dbottorff ) - then the CSV export → open in Excel removes the forward zeroes, and then if you copy/paste the barcode into Inventory from Excel it tells you the item doesn’t exist.

zburke
1 Apr '21

Our CSV implementation is a thin wrapper around a third-party library that offers many options (such as changing the delimiter, providing a BOM, quoting values so they are read as strings rather than numbers to prevent truncation of leading zeros, etc), including some specifically to handle Excel’s non-standard behaviors. In other words, there are hooks for dealing with all the issues raised in this thread so far, so this is more a matter of providing a more elaborate configuration than writing a new implementation from scratch

We could add a table that would automatically set certain options based on locale. We could also add the ability to store per-tenant options in mod-configuration. We could also do both, i.e. provide default values for certain locales while also allowing tenant config to override. This is not a heavy lift; we just have to store a laundry list of configuration options but there is no complicated logic to implement because that’s already been done by the third-party library.

Ann-Marie
2 Apr '21

Hi @enettifee Yes, the dreaded leading zeroes and Excel. One of the best side effects of the ISBN expansion was never having to deal with leading zeroes on 10-digit ISBNs again, since they all could be re-expressed as 13-digit-equivalents that never started with a “0”

UschiKlute
5 Apr '21

This is a possible solution:
Save the file as *.txt.
Then start Excel and open the text file. In the following conversion steps one can

  • select UTF-8 as character set
  • Set the separator (here: comma)
  • format all/individual columns as text (to prevent deleting leading zeroes)

This results in a nice Excel file.
But: to create a list “just quickly”, this is too time-consuming.

UschiKlute
5 Apr '21

Additionally I would like to have the name of the Loan Policy as usually I do not know which loan policy 43198de5-f56a-4a53-a0bd-5a324418967a is :wink: