Blog Post

Machine translation is not quite good enough for health information

By: Marco Campana
March 29, 2021

On a forum I'm on someone posted that visitors can access COVID-19 vaccine info in YOUR LANGUAGE on the Region of Waterloo Website with the click of a button

At a time when public health information getting into the hands of the many is important, machine translation seems useful, right? Here's the thing. Google translate, and other machine translation services simply don't cut it. Evidence and info below.

You know what does cut it? Effort. Investment. Community involvement. Check out this Twitter thread for an example of how to do it right.

The lessons she shares? Worth sharing here:

"1- Go broad, and then narrow. We began by organizing a few general all-communities vaccine education townhalls targeting groups such as elderly followed by specific ethnic/racialized groups based ones done for the Spanish-speaking, Black and South Asian communities.

2. Co-create with BIPOC organizations in local communities. We partnered with 14 community organizations and community ambassadors. This allowed us to get questions directly from community members to make our sessions culturally sensitive and appropriate.

3- Use a variety of formats. We used Zoom as our base. But then broadcasted on YouTube, Facebook as well as ethnic TV taking over prime soap opera time. We also collected questions directly from our communities not just via email, but also WhatsApp and social media.

4- We approached trusted health care professionals from within the communities. When possible, we also leveraged language capabilities. When that wasn’t possible, we reached out to interpreters to offer real-time simultaneous translations.

5- We spent significant time understanding which languages had the greatest need. For instance, a specific language might be a common second language - but important to understand what dual English literacy is within this community. Prioritize based on that.

6- The other advantage of co-creating with BIPOC orgs is that we didn’t need to spend too much effort on marketing or PR. Because these were organized with everyone’s input, everyone felt involved and helped share within their networks.

Oh, and we did these with $0 budget. Never underestimate the power of community. Along the way, we met incredible people who were just as invested as us to increase awareness. This included ethnic media - TV and radio, language interpreters, technical expertise & more."

Why isn't good enough, good enough?

Google Translate as a tool to translate this type of key or core information is potentially problematic. I translated the first 3 paragraphs on the Waterloo Region page into Arabic, and then back into English using Google Translate (that's how I tend to test translation accuracy). There are a few errors, just in those short paragraphs. How many might there be throughout? In other languages? How significant might these errors be for someone trying to access and act on information?

There are documented problems using Google Translate, in particular for health related information, including during the pandemic, all of which impact trust and misinformation. A simple web search reveals:

Federal [Australian] Government used Google Translate for COVID-19 messaging aimed at multicultural communities
"In August, the ABC revealed "nonsensical" and "laughable" language translations of COVID-19 public health messages had been distributed to multicultural communities.

This prompted fears that migrants and refugees would lose trust in the Government's handling of the crisis."

Google Translate still isn’t good enough for medical instructions
"The new study evaluated 400 emergency department discharge instructions translated by Google Translate into seven different languages: Spanish, Chinese, Vietnamese, Tagalog, Korean, Armenian, and Farsi. Native speakers read the translations and evaluated their accuracy. Overall, the translated instructions were over 80 percent accurate."

Is 80% accuracy is fine when you're trying to get the gist of something, perhaps in a personal interaction, trying to quickly understand some text? Probably, and I use Google Translate to help with that. But if information accuracy needs to be closer to 100% it can't be relied on.

This article provides some additional useful context and perhaps a glimpse into a possible future: "There are more than 7,000 languages in the world, 4,000 of which are written. Yet only 100 or so can be translated by automated tools such as Google Translate. New research promises to let us communicate with the others too."

Google itself acknowledges limitations: "Advances in machine learning (ML) have driven improvements to automated translation, including the GNMT neural translation model introduced in Translate in 2016, that have enabled great improvements to the quality of translation for over 100 languages. Nevertheless, state-of-the-art systems lag significantly behind human performance in all but the most specific translation tasks. And while the research community has developed techniques that are successful for high-resource languages like Spanish and German, for which there exist copious amounts of training data, performance on low-resource languages, like Yoruba or Malayalam, still leaves much to be desired. Many techniques have demonstrated significant gains for low-resource languages in controlled research settings (e.g., the WMT Evaluation Campaign), however these results on smaller, publicly available datasets may not easily transition to large, web-crawled datasets."

Translating and localizing information takes effort and investment. Herculean volunteer efforts are completely admirable, amazing, and important. Perhaps authorities spending the time and money to get it right is even more important.

Leave a Reply

Your email address will not be published. Required fields are marked *