On Romanization (Pt.4)

Time for some solutions.

As described in the previous posts, there are tons of problems with the romanization of names and titles in bibliographic metadata. I’ve found articles and letters going back to the earliest days of the implementation of this romanization decrying how it doesn’t serve our communities and is impractical. I’ve found a 2009 article by Michael Brewer that describes learning ALA-LC romanizations as a key competency for students of Slavic studies. If our users have to learn new and unnecessary skills in order to use our libraries, then we aren’t doing are jobs. To break down my complaints (and the complaints of others) here is a quick breakdown:

  • Libraries are reluctant to include any vernacular script that is not one of the JACKPHY + scripts. This makes it hard to search in the languages that use these scripts and disadvantages the Global South.
  • ALA-LC romanization is non-intuitive and inconsistent. It does not usually match the romanization schemes preferred by scholars or native speakers of a language. It often presupposes that letters from a single script are employed the same way in every language.
  • Libraries have a habit of taking on tasks that they ought not to. They do things like standardize place names, invent language codes, and come up with romanization schemes. Please stop. Others do this better.
  • It goes against the spirit of RDA. Previous cataloging rules allowed for all kinds of shorthands and abbreviations, but RDA emphasizes transcribing information directly from the piece. Romanizing feels like a violation of that principle.

So how do we remedy this?

This is a tricky question, as MARC, the current standard for bibliographic metadata, is (supposedly) dying. There has been great excitement in the library world over the introduction of BIBFRAME, which is a new standard that will further separate current practice from the practices that were employed when catalogers had to type out information on catalog cards. Because fixing romanization is futuristic I’ll remain standard-neutral and put out some ideas that could theoretically be applied to any standard.

  1. Transcribe what is on the piece in the vernacular for all relevant elements. This could be the title, author name, publication information or other information. Make this the main piece of information, rather than a secondary one.
  2. Create software that can automatically romanize into a variety of romanization schemes. For Cyrillic, this could include ISO 9, Russian passport transliteration, scholarly transliteration, Azeri Cyrillic to Azeri Latin, and, even, ALA-LC. I know this sort of algorithm is difficult, but others have done it and we don’t need to re-invent the wheel.
  3. Find a way of linking these romanized bits of information to the vernacular. This is already done (see my previous posts about 880 fields). Also include the type of romanization that is being used in any given field so that we can adjust in the future if, say, a Thai user using a Latin script keyboard is having a hard time finding material in their native language because current schemes are insufficient.

As a community, we have a few further decisions to make. Will these transcribed titles be stored in our bibliographic records, or will romanization automatically happen when the user interface communicates with our database? Should we further indicate whether we have employed the vernacular or romanized information in our records? (I say yes.)

As cataloging improves and advances with new technology, we have the opportunity to change how we deal with non-Latin scripts. Let’s enter the 21st Century and use UNICODE, use the input tools that every computer currently offers, use the vernacular that our users should expect.