Skip to main content

How to know which language translations you should support.

If you only pick a few languages, how do you know which ones to prioritize?

Skyler Young avatar
Written by Skyler Young
Updated over 4 months ago

Providing access to help for non-native English speakers is a matter of pragmatism and empathy. Here are some things to consider as you pick which languages to support for your area.

Limited English Proficiency

When picking a subset of languages, it’s important to pay attention not just to the absolute number of languages speakers, but also the number within each language that have Limited English Proficiency (LEP).

Populations with high rates of LEP represent not only the true number of people who need translation services directly, but also represent potential help seekers that are less likely to know someone from their own community who is proficient enough with English to help them.

An example in Washington State.

Every state has statistics on which languages are spoken in a given area. For this example we’ll look at stats for Washington State: https://statisticalatlas.com/state/Washington/Languages

If we pick the top five languages by number of speakers it might look like this:

  • Spanish

  • Chinese

  • Vietnamese

  • Russian

  • Tagalog

However, if we take the top five languages by LEP then it looks like this:

  • Arabic

  • Russian

  • Tai-Kadai

  • Afro-Asiatic

  • Other Slavic

The only language in common is Russian.

Picking the most helpful languages.

In reality, we have to make decisions based on a number of factors, including absolute population size. In Washington, for example, we have a large population of Spanish speakers. I might choose all languages with a LEP rate higher than ~40% and a population greater than ~0.2%:

  • Vietnamese

  • Korean

  • Chinese

  • Other Slavic

  • Afro-Asiatic

  • Russian

  • Spanish

That’s a helpful and still cost effective number of languages.

Other factors to be aware of.

Our list above is illustrative also for the types of edge cases that persist when choosing translations:

  1. “Other Slavic” and “Afro-Asiatic” are baskets of languages that have to be understood. For example, I happen to know that in Washington State “Other Slavic” is mostly Ukrainian, and that’s a language commonly supported locally. The languages in “Afro-Asiatic” are harder to determine, and may individually fall far below our threshold for inclusion.

  2. “Chinese” contains many dialects. Our system recognizes up to seven or eight different and supported dialects. Choosing the right ones for a given area can be difficult if you don’t already know them. In general, we recommend supporting both Chinese Simplified (language code zh-Hans) and Chinese Traditional (language code zh-Hant) for Chinese.

Language Options

we support most of the languages from Microsofts Azure ML Translation (https://learn.microsoft.com/en-us/azure/ai-services/translator/language-support). However, there are edge cases where we can’t support languages in that list.

Some languages don’t have good ML translation support. For example, we have received several requests for Karenni, but this language simply has not been trained for ML support in any online service we’re currently aware of.

Summary

In an ideal world we could support all languages all the time. Many cloud or Machine Learning translation services offer to do just that. However, their costs are prohibitive for most nonprofit work. We have devised a way to more cost effectively manage translations for resources, but it does require picking a subset of languages to support. I hope this article gives you a good basis for determining what that list should be.

Did this answer your question?