Authors
Fatima Al-Raisi, Abdelwahab Bourai and Weijian Lin, Carnegie Mellon University, USA
Abstract
We present symbolic and neural approaches for Arabic paraphrasing that yield high paraphrasing accuracy. This is the first work on sentence level paraphrase generation for Arabic and the first using neural models to generate paraphrased sentences for Arabic. We present and compare several methods for para- phrasing and obtaining monolingual parallel data. We share a large coverage phrase dictionary for Arabic and contribute a large parallel monolingual corpus that can be used in developing new seq-to-seq models for paraphrasing. This is the first large monolingual corpus of Arabic. We also present first results in Arabic paraphrasing using seq-to-seq neural methods. Additionally, we propose a novel automatic evaluation metric for paraphrasing that correlates highly with human judgement.
Keywords
Natural Language Processing, Paraphrasing, Sequence-to-Sequence Models, Neural Networks, Automatic Evaluation, Evaluation Metric, Data Resource