Enhancing virtual ontology based access over tabular data with Morph-CSV

Chaves-Fraga, David; Ruckhaus, Edna; Priyatna, Freddy; Vidal, Maria-Esther; Corcho, Oscar

doi:10.3233/SW-210432

Enhancing virtual ontology based access over tabular data with Morph-CSV

Issue title: Storing, Querying, and Benchmarking the Web of Data

Guest editors: Muhammad Saleem, Ruben Verborgh, Muhammad Intizar Ali and Olaf Hartig

Article type: Research Article

Authors: Chaves-Fraga, David^{a; *} | Ruckhaus, Edna^a | Priyatna, Freddy^a | Vidal, Maria-Esther^b | Corcho, Oscar^a

Affiliations: [a] Ontology Engineering Group, Universidad Politécnica de Madrid, Spain. E-mails: dchaves@fi.upm.es, eruckhaus@fi.upm.es, fpriyatna@fi.upm.es, ocorcho@fi.upm.es | [b] TIB - Leibniz Information Centre for Science and Technology and L3S Leibniz University of Hannover, Germany. E-mail: maria.vidal@tib.eu

Correspondence: [*] Corresponding author. E-mail: dchaves@fi.upm.es.

Abstract: Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational databases, CSV and JSON files), either by materializing integrated data into RDF or by performing on-the-fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented (e.g., referential integrity among sources, datatypes, or data integrity); thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the OBDA query translation process over tabular data. We propose Morph-CSV, a framework for querying tabular data that exploits information from typical OBDA inputs (e.g., mappings, queries) to enforce constraints that can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV relies on both a constraint component and a set of constraint operators. For a given set of constraints, the operators are applied to each type of constraint with the aim of enhancing query completeness and performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM benchmark; transportation with the GTFS-Madrid benchmark; and biology with a use case extracted from the Bio2RDF project. We compare and report the performance of two SPARQL-to-SQL OBDA engines, without and with the incorporation of Morph-CSV. The observed results suggest that Morph-CSV is able to speed up the total query execution time by up to two orders of magnitude, while it is able to produce all the query answers.

Keywords: Knowledge graphs, tabular data, mapping languages, constraints

DOI: 10.3233/SW-210432

Journal: Semantic Web, vol. 12, no. 6, pp. 869-902, 2021

Published: 4 October 2021

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia