One area where data science has already had worldwide impact is in helping humans understand languages they don’t speak.
This challenge is getting bigger. In 2009, 37 languages were required to¨ reach 98 per cent of people who were online. To reach the same percentage in 2012, 48 languages were needed. The solution, of course, is translation, but the era of big data demands a scalable solution. Enter the data scientists with Statistical Machine Translation!
Statistical machine translation (SMT) starts with a very big dataset of ‘good’ translations; these are typically taken from parallel corpora of texts, which have been manually translated into multiple languages, such as the European Parliament records of its proceedings. Using this data, systems can be trained to map out the correspondences between words and phrases in pairs of languages. These mappings are then stored and can be used in future translations.
The work carried out at the University of Edinburgh has led to the development of Moses, the dominant open-source toolkit for building SMT systems. Moses started in 2005 as a research project in the School of Informatics. With European funding under the Euromatrix project, it has grown over the last seven years into the standard machine translation platform used by researchers across the globe.
Today, Moses is one of the most widely adopted machine translation toolkits in industry. Its maturity and quality, as well as its open-source licence, mean that it is often preferred over proprietary systems and its ‘DNA’ can also be traced in most proprietary systems.
Moses has been an important development for the machine translation sector because companies can build custom MT engines without rewriting the translation engine
TAUS, the innovation thinktank and interoperability watchdog for the translation industry, recently stated in their Translation Technology Landscape Report 2013: Moses has been an important development for the machine translation sector because companies can build custom MT engines without rewriting the translation engine. Moses has enabled many companies to launch custom machine translation offerings with a modest effort.”
One such company using Moses is Capita plc. Their Translation and Interpreting division, Capita TI, have developed SmartMATE, which is built using the Moses platform and is fully customisable for their clients. SmartMATE is capable of processing millions of words per day, providing content that can then be post-edited by human translators. Through the use of SMT, Capita states that a human translator can increase productivity from translating 2,000 words to post-editing over 5,000 words per day. The solution works in over 40 different languages, ranging from Albanian to Vietnamese, and is often used by clients to help translate customer enquiries, webpages or user generated content.