Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Laukaitis, Algirdas1; * | Plikynas, Darius2 | Ostasius, Egidijus1
Affiliations: [1] Vilnius Gediminas Technical University, Fundamental Science Faculty | [2] Vilnius University, Institute of Data Science and Digital Technologies. E-mails: algirdas.laukaitis@vgtu.lt, darius.plikynas@mii.vu.lt, egidijus.ostasius@vgtu.lt
Correspondence: [*] Corresponding author.
Abstract: In this paper, we propose a framework for extracting translation memory from a corpus of fiction and non-fiction books. In recent years, there have been several proposals to align bilingual corpus and extract translation memory from legal and technical documents. Yet, when it comes to an alignment of the corpus of translated fiction and non-fiction books, the existing alignment algorithms give low precision results. In order to solve this low precision problem, we propose a new method that incorporates existing alignment algorithms with proactive learning approach. We define several feature functions that are used to build two classifiers for text filtering and alignment. We report results on English-Lithuanian language pair and on bilingual corpus from 200 books. We demonstrate a significant improvement in alignment accuracy over currently available alignment systems.
Keywords: alignment of corpora, alignment of digitized books, machine translation, natural language processing
Journal: Informatica, vol. 29, no. 4, pp. 693-710, 2018
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl