Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Wei, Jiaxina | Yang, Jinb; * | Liu, Xinyanga
Affiliations: [a] Department of Finance and Commerce, Qinghai Higher Vocational and Technical Institute, Haidong, China | [b] Office of Academic Research, Qinghai Higher Vocational and Technical Institute, Haidong, China
Correspondence: [*] Corresponding author. Jin Yang, Office of Academic Research, Qinghai Higher Vocational and Technical Institute, Haidong 810799, China. E-mail: yangjin@qhvtc.edu.cn.
Abstract: Due to intensified off-balance sheet disclosure by regulatory authorities, financial reports now contain a substantial amount of information beyond the financial statements. Consequently, the length of footnotes in financial reports exceeds that of the financial statements. This poses a novel challenge for regulators and users of financial reports in efficiently managing this information. Financial reports, with their clear structure, encompass abundant structured information applicable to information extraction, automatic summarization, and information retrieval. Extracting headings and paragraph content from financial reports enables the acquisition of the annual report text’s framework. This paper focuses on extracting the structural framework of annual report texts and introduces an OpenCV-based method for text framework extraction using computer vision. The proposed method employs morphological image dilation to distinguish headings from the main body of the text. Moreover, this paper combines the proposed method with a traditional, rule-based extraction method that exploits the characteristic features of numbers and symbols at the beginning of headings. This combination results in an optimized framework extraction method, producing a more concise text framework.
Keywords: OpenCV, dilation operation, text structure extraction
DOI: 10.3233/JIFS-234170
Journal: Journal of Intelligent & Fuzzy Systems, vol. 46, no. 4, pp. 8089-8108, 2024
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl