On Romanization (Pt.3)

So we’ve established that cataloging is a bit messy when it’s decided whether or not to include the vernacular in addition to romanized forms. But who decides what this looks like?

In short, it’s the American Library Association and the Library of Congress. They’ve produced tables for 75 different languages and/or scripts, although certain documents combine languages that use a single script (Hebrew and Yiddish, Non-Slavic Languages [in Cyrillic Script]), while other separate out languages that use these same scripts (Judeo-Arabic, Russian, etc.).

In some cases, the romanization scheme has been very thoughtfully constructed by all parties involved. For example, in 2012, the library world collaborated with the Cherokee people to produce a romanization scheme that was amenable to all. In other cases, however, the scheme was clearly assembled by people with little knowledge of the language involved.

If, for example, you look at the non-Slavic languages in Cyrillic script chart, you see that there has been no consideration for how each language behaves. Instead, there was merely a failed attempt to assign every possible Cyrillic letter a Romanized equivalent. If the scheme had been successful, that would be one thing, but it’s horribly inconsistent. Take a look at some of the following:

  • Tatar, Syriac, Kazakh ә is romanized as ă
  • Tatar-Kryashen, Mari, Karelian ӓ is romanized as ă
  • Khanty ӓ is romanized as ä
  • Chuvash ӑ is romanized as ă

Knowing what I know about these languages, the only two romanizations I can agree with are for Khanty and Chuvash; these are the romanizations that most linguists would use. For Tatar, Mari, Kazakh, etc. I would use ä. The romanization scheme is inconsistent – either provide a 1-to-1 romanization for all possible Cyrillic letters or treat each language individually.

As I just noted, scholarly treatment of these languages rarely aligns with ALA-LC. This is because

  1. ALA-LC attempts to create a 1-to-1 system whereby you can easily work out the vernacular form from the romanized form
  2. ALA-LC has attempted to create an internal consistency based on script and not language (see the Cyrillic examples above)

For scholars working on minority languages, especially, it can be frustrating trying to locate materials in these languages when the romanization in the catalog does not align with the rest of the scholarly literature.

It’s bad enough to annoy scholars, but what about actual speakers of a language? What happens when they have their own Romanization schemes? What happens when a language shifts from one script to a Latin-based one? This has happened several times in the former Soviet Union. Azerbaijani, Turkmen, Kazakh, Crimean Tatar and Uzbek have all converted from a Cyrillic script to a Latin-based one. While neither Tatar nor Belorussian have made this shift, both groups have definitive preferences for how their languages should be presented in Latin script. As an example, here are some of the mismatches you’ll encounter when comparing ALA-LC to the official Latin standard for Azerbaijani:

CyrillicALA-LCOfficial Latin
Ҹ ҹ jc
Ч ч chç
Ә ә ăə
Ҝ ҝġg
Ө ө ö

This disconnect is less that ideal because it means that a native speaker of Azerbaijani not only has to know the Latin script that is currently taught in schools and the Cyrillic script that was used up until the early 90s, but also has to learn the ALA-LC Romanization that is used in American and British libraries. And if that same speaker were to go to Germany, they would have to learn the system used there!

I’m not opposed to romanization. While we do have access to input methods for most of the world’s languages, they are still imperfect. I, for example, have no problem reading Cyrillic, yet still have a hard time typing it. Romanization makes things easy.

However, the current system is broken. There is not internal consistency and the wishes of native speakers are overlooked. I have a few suggestions for how to proceed, and I’m describe them in the next post.

Leave a Reply

Your email address will not be published.