We present results of our study of pattern-based XML query languages. A variety of analogs of conjunctive queries over XML documents have been studied in the literature. We focus on languages based on tree patterns, which follow the structure of tree documents. They arise naturally in many problems related to integrating and exchanging data and to handling incomplete information. In this talk, we concentrate on the problems of finding certain answers to queries, and on static analysis, in particular, query containment.
The problem of computing certain answers arises when one queries incompletely specified databases, i.e., databases with missing information. As often happens, the complexity of the problem jumps when one moves from relations to XML. Nonetheless, we identified large relevant classes of queries for which efficient algorithms can be developed. Curiously, for some of these classes, no analogous results existed in the relational world. In fact, we uncovered a well-behaved class of relational queries which had been overlooked so far.
Testing for query containment is a fondamental task in query optimization. In the relational case, classical homomorphism based tools lead to reasonable complexity algorithms. In XML such techniques can be applied only for very simple queries. Beyond thoses classes of queries, they can either be adapted by using more sophisticated data structures, or - provably - they cannot be adapted at all. In addition, we look at the containment problem in the presence of extra features (such as schemas for documents).
The talk will be based on papers from ICDT'12 and ICDT'13.