Affiliations: [a] Department of Computer Science and Engineering, National Taiwan Ocean University, 2 Pei-Ning Road, Keelung 20224, Taiwan | [b] Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan
Abstract: Textual Entailment (TE) is the task of recognizing entailment, paraphrase, and contradiction relations between a given text pair. The goal of textual entailment research is to develop a core inference component that can be applied to various domains such as QA or IR. We observed several rank correlations on the test data and system results in the NTCIR-10 RITE-2 task, trying to find out correlations between datasets and evaluation metrics. We also constructed RITE4QA datasets in the RITE-2 task under the scenario of QA in order to see the applicability of RITE techniques in QA systems. Although we find that datasets created from different sources and different ways can hardly predict each other, we also find that ranking by RITE metrics has moderate correlation with the ranking by QA metrics if testing on artificial pairs. Both RITE metrics and QA metrics are stable in terms of their own subtasks.