Jörg Tiedemann

Affiliation: Uppsala University, Suecia.

Title of talk: "Linguistic Solidarity in Machine Translation - Statistical MT for Under-Resourced Languages and Domains"

Abstract: Statistical Machine Translation (SMT) is by far the most dominant approach to automatic translation in current research and system development. The success of statistical methods is based on the availability of large bodies of parallel and monolingual texts. However, appropriate resources are non-existing or not accessible for most language pairs. Furthermore, current SMT models are very sensitive to domain shifts but domain-specific data is usually sparse or difficult to obtain even for "high-density" languages. In my talk I will look at specific problems for selected language pairs and discuss possibilities for training machine translation systems with extremely sparse data sets. In particular, I will look at triangulation and pivot-based translation as a way to overcome the shortage of data. The main idea in this approach is to make use of intermediate languages to support poorly resourced languages. I will focus on closely related languages as pivot and emphasize the use of low-level translation techniques that take advantage of structural andlexical similarities between those languages.


  • Paseo del General Martínez Campos, 44 – 1º | 28010 Madrid, Spain
  • Tel: 91.383.60.00 |  Fax: 91.302.39.26
  • Esta dirección de correo electrónico está siendo protegida contra los robots de spam. Necesita tener JavaScript habilitado para poder verlo.
Sitio web creado por GRUPO PACIFICO