• Skip to primary navigation
  • Skip to main content
The Data Lab

The Data Lab

Pruple button with the word menu
  • About Us
        • About Us

           
          Visit our About Us page

        • Careers
        • Our Team
        • Impact
        • The Scottish AI Alliance
        • Contact us
  • Business
        • For Business

           

          Visit our Business Support page

        • Access Talent
        • Funding and Business Support
        • Partnerships
  • Professionals
        • For Professionals

           

          Visit our Professional Development page

        • Online Courses
        • Data Skills for Work
  • Students
        • For Students

           

          Visit our Students page

        • The Data Lab Academy
        • Student Placements
        • Scholarships
  • Universities and Colleges
        • For Universities and Colleges

           

          Visit our Universities and Colleges page

        • Funding and Support
        • Collaborate and Innovate
        • Academic Projects
  • Community
        • Community

           

          Visit our Community page

        • Online Community
        • News
        • Case Studies
        • DataFest

Machine Translation

Technical Skills 07/11/2014

One area where data science has already had worldwide impact is in helping humans understand languages they don’t speak.

This challenge is getting bigger. In 2009, 37 languages were required to۬ reach 98 per cent of people who were online. To reach the same percentage in 2012, 48 languages were needed. The solution, of course, is translation, but the era of big data demands a scalable solution. Enter the data scientists with Statistical Machine Translation!

Statistical machine translation (SMT) starts with a very big dataset of ‘good’ translations; these are typically taken from parallel corpora of texts, which have been manually translated into multiple languages, such as the European Parliament records of its proceedings. Using this data, systems can be trained to map out the correspondences between words and phrases in pairs of languages. These mappings are then stored and can be used in future translations.

The work carried out at the University of Edinburgh has led to the development of Moses, the dominant open-source toolkit for building SMT systems. Moses started in 2005 as a research project in the School of Informatics. With European funding under the Euromatrix project, it has grown over the last seven years into the standard machine translation platform used by researchers across the globe.

Today, Moses is one of the most widely adopted machine translation toolkits in industry. Its maturity and quality, as well as its open-source licence, mean that it is often preferred over proprietary systems and its ‘DNA’ can also be traced in most proprietary systems.

Moses has been an important development for the machine translation sector because companies can build custom MT engines without rewriting the translation engine

TAUS, the innovation thinktank and interoperability watchdog for the translation industry, recently stated in their Translation Technology Landscape Report 2013: “Moses has been an important development for the machine translation sector because companies can build custom MT engines without rewriting the translation engine. Moses has enabled many companies to launch custom machine translation offerings with a modest effort.”

One such company using Moses is Capita plc. Their Translation and Interpreting division, Capita TI, have developed SmartMATE, which is built using the Moses platform and is fully customisable for their clients. SmartMATE is capable of processing millions of words per day, providing content that can then be post-edited by human translators. Through the use of SMT, Capita states that a human translator can increase productivity from translating 2,000 words to post-editing over 5,000 words per day. The solution works in over 40 different languages, ranging from Albanian to Vietnamese, and is often used by clients to help translate customer enquiries, webpages or user generated content.

Innovate • Support • Grow • Respect

Get in touch

t: +44 (0) 131 651 4905

info@thedatalab.com

Follow us on social

  • Twitter
  • YouTube
  • Instagram
  • LinkedIn
  • TikTok

The Data Lab is part of the University of Edinburgh, a charitable body registered in Scotland with registration number SC005336.

  • Contact us
  • Partnerships
  • Website Accessibility
  • Privacy Policy
  • Terms & Conditions

© 2025 The Data Lab