Kamusi: Multilingual Dictionary Aims to Aid Machine Translation with All Words in Every Language [VIDEO]

By Staff Reporter on February 26, 2013 9:13 PM EST

Languages
A new online multilingual dictionary hopes to compile all words from every language in the world. (Photo: Creative Commons)

An interactive multilingual dictionary has long been a pipe dream of artificial intelligence researchers.

While automatic algorithm-based machine translation products like Google Translate can easily convert the gist of text from one language into another, they cannot explain the meaning.

Now, an extremely ambitious online dictionary hopes to solve that problem by compiling every word in every language. If funded, it could change the way machines translate between languages.

Like Us on Facebook

The self-proclaimed "Dictionary of the Future" is called the Kamusi Global Online Living Dictionary (GOLD), and launched this week to follow International Mother Tongue Day on February 21.

The Kamusi Project initially began in 1994 as a collective bilingual dictionary between Swahili and English, launched by anthropologist and lexicographer Martin Benjamin when he was a graduate student at Yale. It was named "Kamusi" after the Swahili word for dictionary.

The model anticipated Wikipedia a decade early - a comprehensive crowd-sourced guide to Swahili that allowed community members to add entries piece by piece.

With grant support from the US Department of Education and private donations, the Kamusi Project eventually grew into one of the world's most popular Swahili language resources.

Benjamin has long had broader goals, and after four years of work on the Kamusi Project, his team has now released a re-engineered platform that can support resources for any and all languages.

At the time of launch, the Kamusi GOLD demonstration version includes 100 words from 15 languages, including English, Swahili, French, Gusii, Hehe, Japanese, Kinyarwanda, Luganda, Mandarin Chinese, Pulaar, Romanian, Setswana, Songhay, Spanish, and Yeyi.

The initial focus of the project is African languages, continuing Kamusi's original interest in Swahili and the 2000 other languages spoken in Africa. According to AllAfrica, all the programming for the Kamusi project has taken place in Africa.

Unlike other dictionaries, the GOLD is built around concepts as well as words, so it adds context for machine translation problems like homonyms.

As New Scientist explained,

"This structure could solve one of the biggest challenges for machine translation. Asked to translate 'spring in her step' into French, for example, Google chooses printemps - the season - for 'spring'. Similar examples abound. The inability of computers to deal with homonyms - words that are spelled the same but have different meanings - is one reason why machine translations are often so garbled.

"Kamusi avoids this problem by recognising that "spring" is associated with multiple concepts and prompting the user to say which is relevant.  So the word "spring", for instance, is linked to several concepts, including the season that comes before summer and a sudden upwards or forward motion."

The project has been supported by a grant from the US National Endowment for the Humanities and relies on volunteers.

Benjamin hopes that bilingual speakers, especially of uncommon minority languages, will be motivated to add terms for free, since their additions will help them translate to more widely represented languages that are already in the system.

He also expects that companies that do business in Africa, like China and the United States, might be motivated to pay for large numbers of local words to be added to Kamusi because so many African languages are not represented well by existing dictionaries.

The major drawback of the project is that, unlike the automated algorithmic approach of Google Translate, the Kamusi Project's multilingual dictionary requires the ongoing input of humans, which is time-consuming and expensive.

Benjamin estimates that, including wages and other expenses, it will cost about $5 to add and verify each new concept. Kamusi expects that a language dictionary containing 10,000 terms is sufficient for general use, so a basic dictionary would cost $50,000 per language.

As the New Scientist pointed out, representing 10,000 concepts in 100 languages would require $5 million - a large sum, but certainly not insurmountable.

A company like Google would have an obvious interest in improving its own automated translation product, and could provide that amount in an instant.

The Kamusi Project's aims to keep its multilingual dictionary in the public domain as a free resource, so it remains to be seen how it would respond to private interest.

In the meantime, you can explore and add to it yourself at Kamusi.org.

© 2012 iScience Times All rights reserved. Do not reproduce without permission.

Join the Conversation

Sponsored From Around the Web

    ZergNet
Follow iScience Times
us on facebook RSS
 
us on google
 
Most Popular
INSIDE iScience Times
Do Dolphins Get High? BBC Cameras Catch Dolphins Chewing On Pufferfish Toxins
Do Dolphins Get High? BBC Cameras Catch Dolphins Chewing On Pufferfish Toxins
How Many Ways Can You Tie A Tie?
How Many Ways Can You Tie A Tie?
Ribbon Of Charged Particles At Solar System's Edge Acts Like A Wind Sock For Interstellar Magnetism
Ribbon Of Charged Particles At Solar System's Edge Acts Like A Wind Sock For Interstellar Magnetism
How to Turn Your Tap Water Faucet  Into a Coffee Spout [VIDEO]
How to Turn Your Tap Water Faucet Into a Coffee Spout [VIDEO]
Coolest Science Photos Of 2013: From Blobfish To Two-Headed Shark, Comet ISON To Mars Selfie
Coolest Science Photos Of 2013: From Blobfish To Two-Headed Shark, Comet ISON To Mars Selfie
This Is A Scientifically-Proven Rock-Paper-Scissors Winning Strategy (But If Your Opponent Uses It Too, It's A Draw)
This Is A Scientifically-Proven Rock-Paper-Scissors Winning Strategy (But If Your Opponent Uses It Too, It's A Draw)