Authors
Karima Abidi1 Kamel Smaili2, 1Ecole Superieure d'Informatique (ESI), Algeria and 2Campus Scientifique LORIA, France
Abstract
In this paper, we propose a method that aligns comparable bilingual tweets which, not only takes into account the specificity of a Tweet, but treats also proper names, dates and numbers in two different languages. This permits to retrieve more relevant target tweets. The process of matching proper names between Arabic and English is a difficult task, because these two languages use different scripts. For that, we used an approach which projects the sounds of an English proper name into Arabic and aligns it with the most appropriate proper name. We evaluated the method with a classical measure and compared it to the one we developed. The experiments have been achieved on two parallel corpora and shows that our measure outperforms the baseline by 5.6% at R@1 recall.
Keywords
Comparability measure ; Arabic stemming, Proper names; Soundex, Twitter