SPARQL with property paths on the Web
Issue title: ESWC 2015 Best Papers
Guest editors: Fabien Gandon, Marta Sabou and Harald Sack
Article type: Research Article
Authors: Hartig, Olafa; b; * | Pirrò, Giuseppec
Affiliations: [a] Hasso Plattner Institute, Universität Potsdam, Germany | [b] Department of Computer and Information Science (IDA), Linköping University, Sweden. E-mail: olaf.hartig@liu.se | [c] Italian National Research Council (ICAR-CNR), Rende (CS), Italy. E-mail: pirro@icar.cnr.it
Correspondence: [*] Corresponding author. E-mail: olaf.hartig@liu.se.
Abstract: Linked Data on the Web represents an immense source of knowledge suitable to be automatically processed and queried. In this respect, there are different approaches for Linked Data querying that differ on the degree of centralization adopted. On one hand, the SPARQL query language, originally defined for querying single datasets, has been enhanced with features to query federations of datasets; however, this attempt is not sufficient to cope with the distributed nature of data sources available as Linked Data. On the other hand, extensions or variations of SPARQL aim to find trade-offs between centralized and fully distributed querying. The idea is to partially move the computational load from the servers to the clients. Despite the variety and the relative merits of these approaches, as of today, there is no standard language for querying Linked Data on the Web. A specific requirement for such a language to capture the distributed, graph-like nature of Linked Data sources on the Web is a support of graph navigation. Recently, SPARQL has been extended with a navigational feature called property paths (PPs). However, the semantics of SPARQL restricts the scope of navigation via PPs to single RDF graphs. This restriction limits the applicability of PPs for querying distributed Linked Data sources on the Web. To fill this gap, in this paper we provide formal foundations for evaluating PPs on the Web, thus contributing to the definition of a query language for Linked Data. We first introduce a family of reachability-based query semantics for PPs that distinguish between navigation on the Web and navigation at the data level. Thereafter, we consider another, alternative query semantics that couples Web graph navigation and data level navigation; we call it context-based semantics. Given these semantics, we find that for some PP-based SPARQL queries a complete evaluation on the Web is not possible. To study this phenomenon we introduce a notion of Web-safeness of queries, and prove a decidable syntactic property that enables systems to identify queries that are Web-safe. In addition to establishing these formal foundations, we conducted an experimental comparison of the context-based semantics and a reachability-based semantics. Our experiments show that when evaluating a PP-based query under the context-based semantics one experiences a significantly smaller number of dereferencing operations, but the computed query result may contain less solutions.
Keywords: Property paths, Web navigational language, Web safeness, SPARQL
DOI: 10.3233/SW-160237
Journal: Semantic Web, vol. 8, no. 6, pp. 773-795, 2017