June 18, 2019 |
Translators without Borders (TWB) is using support from the Cisco Foundation to allow more people to share their voice. The organization is planning to develop machine translation and give access to information in marginalized languages.
Despite globalization and growing internet penetration rates worldwide, people in many countries have to overcome linguistic barriers to communication. Information is still available only to people who speak the languages of the internet.
But things are about to change. This new project involving machine translation could give real access to information to people speaking marginalized languages. “Gamayun” is the language equality initiative that is meant to make translation easier into and from languages that are less known on the internet.
Gamayun includes machine translation for 10 marginalized languages to improve the quality of humanitarian services in crisis-affected communities.
TWB and Cisco are looking to scale a program to develop machine translation and improve two-way communication at global levels. Let’s see how machine translation could impact international communication and populations worldwide.
An overwhelming number from the 7,000 spoken languages today have limited or no support online. Many of them aren’t rare languages spoken by a handful of people, but idioms and dialects with millions of speakers.
Google Translate, for instance, supports a little over 100 languages worldwide. That’s less than 15 percent of all languages! Odia, for example, is an official language in India and is spoken by 40 to 50 million people in the states of Odisha and Jharkhand. However, there’s no software to facilitate translation from and into Odia.
And this is just one example. India alone has over 20 official languages, and several of them aren’t available online. Hundreds of millions of people get information in their second or third language online–or don’t get information at all.
Grace Tang, TWB’s Gamayun Program Manager, explained to GALA: “People who speak marginalized languages lack access to critical and life-saving information. Furthermore, organizations often can’t efficiently process and respond to feedback from people who speak these languages.”
Translators without Borders is looking to make marginalized languages more accessible using language technology. Through Gamayun, the organization aims to develop machine translation for languages that currently have inconsistent levels of support on the internet.
The new software is meant to integrate several other development tools to enable speakers of marginalized languages to access information freely. TWB and Cisco are looking to provide access to information through both voice and text.
This initiative will also allow people who speak marginalized languages to communicate their needs and ideas in their native languages.
Machine translation requires enormous amounts of data to be effective. The software needs to analyze a large number of parallel content to be able to provide correct translations between language pairs.
Ideally, this content includes a wide range of documents from all genres. From news reports and official documents to novels and films, all types of content are necessary to create the software’s translation memory.
If any of these pieces are missing, the software couldn’t translate communication efficiently. When machine translation only uses social media posts, for instance, users can’t rely on software to translate a legal document.
It’s hard work, as these datasets require collaboration between translators, linguists, programmers, and developers.
Gamayun has already registered success with its pilot project, however. In the first phase, TWB developed a machine translation tool for Levantine Arabic. The software is used by the World Food Programme. For the pilot, TWB collaborated with PNGK and Prompsit to build a custom machine-translation tool in less than two months.
For the initial phase of the project, the software handled domain-specific terminology and colloquialisms from social media. According to developers, the results were promising, which means that such tools can help people communicate more effectively, regardless of the language they speak.
Machine translation has become a crucial tool for language service providers. The advantages of using software for translations are obvious. The rate of machine translation is significantly faster than that of human translation.
With machine translation, language experts can get thousands of translated words each minute. The generated content isn’t of high-quality right away, but it speeds up the translation process. Even with a rigorous post-editing process in place, CAT translation tools need a fraction of the time you would spend with 100 percent human translation.
Another advantage of using machine translation is reduced costs. As you get fast turnarounds, you save tens and even hundreds of hours of work.
The post-editing process can be lengthy. However, you still save significant time and energy when you use machine translation as a first version of the content in the target language.
Above all, the most significant advantage of using machine translation is its ability to memorize exhaustive glossaries of terms and reuse them for further projects.
Key terms and industry-specific terminology can be a challenge, even for the best linguists. When you have a database with all terms always available, you can improve workflows and increase the accuracy and consistency of your translations.
The Cisco Foundation allows Translations without Borders to start the second phase of the Gamayun language equality initiative. Thanks to Cisco’s support, TWB can begin working on two other pilot projects in 2019.
Each one will cover another case and language to develop text and voice datasets for other two marginalized languages. With these two new outlines, TWB can set the base for a scalable model, which will enable the development of machine translation for 10 more languages in the next five years.
Erin Connor of the Cisco Foundation commented: “We believe that TWB’s Gamayun initiative presents an exciting opportunity to apply machine learning translation technology to marginalized languages, ultimately enabling better communication and humanitarian services to crisis-affected communities.”
According to GALA, TWB is looking for people willing to identify use cases, develop new technology, contribute language data, and fund further steps of the initiative. We wish them all the very best!