Applying various distance functions and feature extraction schemes to ambiguity resolution

Rezapour, Abdoreza; Fakhrahmad, Seyed Mostafa; Sadreddini, Mohammad Hadi

doi:10.3233/IDA-173385

Applying various distance functions and feature extraction schemes to ambiguity resolution

Article type: Research Article

Authors: Rezapour, Abdoreza^* | Fakhrahmad, Seyed Mostafa | Sadreddini, Mohammad Hadi

Affiliations: Department of Computer Science and IT, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran

Correspondence: [*] Corresponding author: Abdoreza Rezapour, Department of Computer Science and IT, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran. E-mail: Abdoreza.Rezapour@gmail.com.

Abstract: . Word Sense Disambiguation, which is one of the most challenging problems in the process of machine translation, can be considered as a classification problem. In this paper, we use K-Nearest-Neighbor, as one of the most popular classification methods, as well as some knowledge based resources in order to design a WSD scheme. The success of K-Nearest-Neighbor is tightly dependent on two factors; the features used to represent the context in which an ambiguous word occurs and the distance/similarity measure used for comparison of text vectors. Hence, in the present study, we focus on these two matters. For the first purpose, we extract three sets of features; syntactic features, lexical features and semantic features. In order to produce enriched and useful corpora, we apply preprocessed steps. In this work, we carry out a feature selection process as well as a feature weighting policy in order to fine-tune the classifier. For the second purpose, we try several distance/similarity metrics (rather than one metric) in order to find the most proper one. We also assign and use feature weights and propose a weighted formula for every metric. Moreover, to show that the proposed schemes are not language-dependent, we apply the suggested schemes to two sets of data; English and Persian corpora. The evaluation results, with regards to the feature selection and feature weighting strategies, show that the semantic and syntactic features have a significant effect on the classification ability of the system. The results are also encouraging compared to state of the art.

Keywords: Machine translation, word sense disambiguation, K-Nearest-Neighbor, similarity metric, distance metric, feature weighting, feature selection

DOI: 10.3233/IDA-173385

Journal: Intelligent Data Analysis, vol. 22, no. 3, pp. 617-638, 2018

Published: 7 May 2018

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia