In a world where communication is key, Google Translate has committed efforts to break language barriers, enabling people to connect and understand each other better in the global digital ecosystem. Harnessing the latest technologies, Google aims to make this powerful tool accessible to a wider audience than ever before.
In 2022, Google Translate made a significant leap by incorporating 24 new languages through Zero-Shot Machine Translation, an innovative method where a machine learning model learns to translate a language without any prior examples. This breakthrough was part of Google’s ambitious 1,000 Languages Initiative, a project designed to support the world’s most spoken languages.
The company recently announced a monumental update, driven by its advanced PaLM 2 large language model. This expansion added 110 new languages to Google Translate, marking the most extensive growth in its history. “We are now using AI to expand our language offerings,” the company stated, highlighting the use of cutting-edge technology to enhance their services.
These new additions encompass languages spoken by over 614 million people, roughly 8% of the global population. The range of languages varies from widely spoken tongues like Cantonese to those belonging to smaller Indigenous communities, such as Qʼeqchiʼ, many of which are in the process of revitalization.
A notable aspect of this expansion is its focus on African languages. New additions like Fon, Kikongo, Luo, Ga, Swati, Venda, and Wolof underscore Google’s commitment to supporting linguistic diversity on the continent. This marks the largest inclusion of African languages in the tool’s history.
Among the newly added languages, Afar, a dialect in Djibouti, Eritrea, and Ethiopia. Also, Cantonese Long which is a Mandarin writing. Manx, a Celtic language from the Isle of Man, brought back from near extinction in 1974. NKo, A standardized form of West African Manding languages, featuring a unique alphabet created in 1949.
Furthermore, Punjabi (Shahmukhi), is the most spoken language in Pakistan, written in the Perso-Arabic script. Also, Tamazight (Amazigh)- a Berber language from North Africa, is supported in both Latin and Tifinagh scripts. Tok Pisin-an English-based creole and the lingua franca of Papua New Guinea.
The process of adding new languages is intricate, involving careful consideration of regional varieties, dialects, and spelling standards. Google’s approach prioritizes the most commonly used varieties. For instance, their model for Romani generates text that closely resembles Southern Vlax Romani, while incorporating elements from Northern Vlax and Balkan Romani.
ALSO READ: IVORY COAST: IHS SUPPORTS NATIONAL WEB ART CREATIVITY COMPETITION
PaLM 2 plays a pivotal role in this expansion, efficiently learning languages related to existing ones, such as Hindi dialects (Awadhi, Marwadi) and French Creoles (Seychellois Creole, Mauritian Creole). As technology advances and with ongoing collaboration with linguists and native speakers, Google remains committed to supporting an ever-growing array of language varieties and spelling conventions.