Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Sánchez, A.a; * | Mello, C.A.B.b | Suárez, P.D.a | Lopes, A.b
Affiliations: [a] Departamento de Ciencias de la Computación, Universidad Rey Juan Carlos, Madrid, Spain | [b] Center of Informatics, Federal University of Pernambuco, Recife, Brazil
Correspondence: [*] Corresponding author: Dr. A. Sánchez, Departamento Ciencias de la Computación, Universidad Rey Juan Carlos, 28933 Madrid, Spain. E-mail: angel.sanchez@urjc.es.
Abstract: There exists a high interest in the digitization of handwriting historical documents, in the quest to preserve the cultural heritage of nations. In general, these manuscript images present new segmentation difficulties with respect to non-historical documents. The problems come from features such as paper aging, faded ink, back-to-front ink superposition or variable line skew, among others. This paper presents a methodology for detecting and extracting the text lines of images from complex handwritten historical documents. The proposed line segmentation algorithm is based on computing a binary transition map of the document and then extracting and refining the corresponding line regions through skeletonization. To improve the accuracy of line segmentation, a new graph-based splitting method to separate the touching lines is introduced. Once text lines have been segmented, we propose an algorithm based on mathematical morphology operators and position heuristics, to extract the component words on each text line. The robustness and accuracy of our approach was tested on digitalized pages of two complex historical document datasets: the correspondence of Nabuco and the family papers of Graham Bell. We have also successfully compared our algorithms to other general line and word segmentation algorithms presented at the ICDAR 2007 Handwriting Segmentation Contest.
Keywords: Handwriting analysis, historical document images, text-line segmentation, word extraction, graph theory
DOI: 10.3233/ICA-2011-0365
Journal: Integrated Computer-Aided Engineering, vol. 18, no. 2, pp. 125-142, 2011
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl