Authors
Jessica C. Ramirez1,2, Yuji Matsumoto2 and Darwin Munoz1, 1Universidad Iberoamericana ( UNIBE ), Dominican Republic and 2Nara Institute of Science and Technology, Japan
Abstract
The quality, length and coverage of a parallel corpus are fundamental features in the performance of a Statistical Machine Translation System (SMT). For some pair of languages there is a considerable lack of resources suitable for Natural Language Processing tasks. This paper introduces a technique for extracting medical information from the Wikipedia page. Using a medical ontological dictionary and then we evaluate on a Japanese-Spanish SMT system. The study shows an increment in the BLEU score.
Keywords
Comparable Corpora, Dictionary, Ontology, Machine Translation