Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Bayrak, Coşkun | Kolukísaoğlu, Hayrettin | Sieloff, Steve
Affiliations: Computer Science Department, University of Arkansas at Little Rock, Little Rock, AR, U.S.A. | Acxiom Corporation, Little Rock, AR, U.S.A.
Abstract: The World Wide Web (WWW) is becoming the most important source of information for business intelligence and information dissemination. Past information gathering techniques like surfing and sifting are proving insufficient in processing the vast volumes of data readily available from the Web. In addition, companies are being forced to integrate this vast data repository within specific cost, time, and reliability spectrums. This paper presents the fundamentals of a system called "Browser Harness" (B2H) that extracts the requested data from Web sites in a supervised fashion. The algorithmic background of this system is based on the tag structure of web pages, as HTML is the predominate choice for rendering web page content on the WWW. B2H is an interactive tool for harnessing data from semi-structured and structured web pages by analyzing the tag structure of the input page and locating the data in the HTML code. The extracted data is then exported to XML, delimited text, or database tables.
Keywords: Data extraction, web mining
Journal: Journal of Integrated Design & Process Science, vol. 7, no. 4, pp. 13-23, 2003
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl