Graph database query languages feature expressive
yet computationally expensive pattern matching capabilities.
Answering optional query clauses in SPARQL for instance renders
the query evaluation problem immediately PSPACE-complete.
Light-weight graph pattern matching relations, such as simulation,
have recently been investigated as promising alternatives to
more expensive query mechanisms like, e. g., computing subgraph
isomorphism. Still, pattern matching alone lacks expressive query
capabilities: graph patterns may be combined by usual inner
joins. However, including more sophisticated operators is inevitable
to make solutions more useful for emerging applications.
In this paper we bridge this gap by introducing a new dual
simulation process operating on SPARQL queries. In addition
to supporting the full syntactic structure of SPARQL queries,
it features polynomial-time pattern matching to compute an
overapproximation of the query results. Moreover, to achieve
running times competing with state-of-the-art database systems,
we develop a novel algorithmic solution to dual simulation graph
pattern matching, based on a system of inequalities that allows
for several optimization heuristics. Finally, we achieve soundness
of our process for SPARQL queries including UNION, AND and
OPTIONAL operators not restricted to well-designed patterns.
Our experiments on synthetic and real-world graph data promise
a clear gain for graph database systems when incorporating the
new dual simulation techniques.
|