Providing access to help for non-native English speakers is a matter of pragmatism and empathy. Here are some things to consider as you pick which languages to support for your area.
Limited English Proficiency
When picking a subset of languages, it’s important to pay attention not just to the absolute number of languages speakers, but also the number within each language that have Limited English Proficiency (LEP).
Populations with high rates of LEP represent not only the true number of people who need translation services directly, but also represent potential help seekers that are less likely to know someone from their own community who is proficient enough with English to help them.
An example in Washington State.
Every state has statistics on which languages are spoken in a given area. For this example we’ll look at stats for Washington State: https://statisticalatlas.com/state/Washington/Languages
If we pick the top five languages by number of speakers it might look like this:
Spanish
Chinese
Vietnamese
Russian
Tagalog
However, if we take the top five languages by LEP then it looks like this:
Arabic
Russian
Tai-Kadai
Afro-Asiatic
Other Slavic
The only language in common is Russian.
Picking the most helpful languages.
In reality, we have to make decisions based on a number of factors, including absolute population size. In Washington, for example, we have a large population of Spanish speakers. I might choose all languages with a LEP rate higher than ~40% and a population greater than ~0.2%:
Vietnamese
Korean
Chinese
Other Slavic
Afro-Asiatic
Russian
Spanish
That’s a helpful and still cost effective number of languages.
Other factors to be aware of.
Our list above is illustrative also for the types of edge cases that persist when choosing translations:
“Other Slavic” and “Afro-Asiatic” are baskets of languages that have to be understood. For example, I happen to know that in Washington State “Other Slavic” is mostly Ukrainian, and that’s a language commonly supported locally. The languages in “Afro-Asiatic” are harder to determine, and may individually fall far below our threshold for inclusion.
“Chinese” contains many dialects. Our system recognizes up to seven or eight different and supported dialects. Choosing the right ones for a given area can be difficult if you don’t already know them. In general, we recommend supporting both Chinese Simplified (language code
zh-Hans) and Chinese Traditional (language codezh-Hant) for Chinese.
Language Options
we support most of the languages from Microsofts Azure ML Translation (https://learn.microsoft.com/en-us/azure/ai-services/translator/language-support). However, there are edge cases where we can’t support languages in that list.
Some languages don’t have good ML translation support. For example, we have received several requests for Karenni, but this language simply has not been trained for ML support in any online service we’re currently aware of.
Summary
In an ideal world we could support all languages all the time. Many cloud or Machine Learning translation services offer to do just that. However, their costs are prohibitive for most nonprofit work. We have devised a way to more cost effectively manage translations for resources, but it does require picking a subset of languages to support. I hope this article gives you a good basis for determining what that list should be.
