Research dataset discovery from research publications using web context

Singhal, Ayush; Srivastava, Jaideep

doi:10.3233/WEB-170354

Research dataset discovery from research publications using web context

Article type: Research Article

Authors: Singhal, Ayush^{a; *} | Srivastava, Jaideep^b

Affiliations: [a] National Center of Biotechnology Information, National Institutes of Health, Bethesda, Maryland, USA. E-mail: ayush.singhal@nih.gov | [b] Department of Computer Science and Engineering, University of Minnesota, Twin Cities, Minneapolis, Minnesota, USA. E-mail: srivasta@cs.umn.edu

Correspondence: [*] Corresponding author. E-mail: ayush.singhal@nih.gov.

Abstract: Scientific datasets play a crucial role in data-driven research. While there are several repositories that curate public datasets, several more datasets and their usage is “hidden” in the research publications. Hence, discovering a relevant dataset for a research topic requires in-depth investigation of several publications, tracking dataset usage and in-exhaustive literature search. To this end, a search engine to directly handle the research dataset discovery problem is extremely useful for the scientific community. In this work, we define an important paradigm of dataset search known as “dataset discovery in application context”. Unlike dataset look-up type search where the user looks up for dataset in a repository, application context based search corresponds to search without information about the name of the dataset. Such searches arise when the user is looking a best fit dataset for his research problem. We show that in this paradigm of search, conventional methods of indexing the little text about the dataset description do not work due to lack of application text content within the description text for a dataset. To alleviate this problem we propose two models of search, namely, (1) a user profile based search and (2) a keyword based search. We show that in both these models the dataset discovery is done in the application context by leveraging information from open source web resources such as scholarly articles repositories and academic search engines. The performance of the proposed models were tested with simulated test queries (user profiles) as well as with real world user studies.

Keywords: Search engine, text mining, context generation, dataset search

DOI: 10.3233/WEB-170354

Journal: Web Intelligence, vol. 15, no. 2, pp. 81-99, 2017

Published: 8 May 2017

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia