Affiliations: Fondazione Bruno Kessler, Trento, Italy | DISI, University of Trento, Trento, Italy
Note: [] Corresponding author. Octavian Popescu, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Trento, Italy. E-mail: popescu@fbk.eu
Abstract: In this paper we present various methods of estimating the K-number, the number of distinct entities carrying the same name in a corpus and an analysis of their characteristics and their impact on person cross document coreference task (PCDC). There are two important classes of such methods, corpus based and external resource based. The experiments reported here show that the estimation of K-number plays an important role for PCDC, from understanding the complexity of the task to improving the overall accuracy of coreference.
Keywords: Person Cross-Document Coreference, Cluster Number Estimation, Domain Knowledge, Corpus-based Methods