A node test : label
+ name, *, text(), node(), â¦
+
An optional filter : an XPath expression whose
+ truth value determine is the node is kept
+
+
+
+
+
+
+
+
Filters (or predicates)
+
An arbitrary expression whose value is cast into a
+ Boolean (true keeps the selected node, false
+ discards it)
+
Example :
+
+ //A[ child::B and .//* = count(/descendant::X) mod 12 ]
+
+
+
* is an alias for child::*
+
// is an alias for /descendant-or-self::node()/
+
. is an alias for self::node()
+
+
BTW: XPath semantics
+ is crazyrich. In
+ the example we want all A nodes (ok) that have
+ a B child (ok) and that have a
+ descendant (hum⦠)
+ for which the (string) concatenation
+ of its text descendants (?) is equal to the number
+ of X elements in the document mod 12
+ (whaaat ?)
+
+
+
Tatoo : what is it ? how does it work ?
+
+
SXSI
+
Succint XML Self-Index: (limited) XPath query engine developped during my
+ postdoc at Nicta. It uses:
+
Succinct tree index of Arroyuello, Claude, Navarro :
+ stores a tree of size M with labels in an alphabet of
+ size θ in 4M(2 + log θ) + o(M) bits.
+ Full navigation + jumping from a node to the next one with a
+ given label (and also LCA of two nodes) in O(1).
+
+
Non det. alternating tree-automata of Maneth and
+ myself, on the
+ fly determinization and approximation of the sets of relevant
+ nodes (worst case O(M·|A|) to evaluate an automaton
+ of size A).
+
+
+
Can answer downward (no ancestor, parent, â¦) XPath with text
+ expressions of the form (f(., constant),
+ with f ∈ { contains, starts-with, ends-with, =, <, > ,
+ <=, >= }). Translation from XPath to automata is
+ linear in size
+
+
+
Pros and cons of SXSI
+
Pros
+
For the considered subset: much faster than state of the
+ art optimized XML DB
Retain polynomial (combined complexity) running
+ time, i.e.O(|Doc|·|Q|) for an XPath
+ query Q.
+
+
+
+
+
How does it work ?
+
Sample document: alphabet.xml
+
+
+
+
+
Example 1 : /descendant::M
+
+
+
+
Example 2
+ /descendant::M/ancestor::A[ descendant::P or parent::Q ]/child::*
+
+
+
+
Open problems, todos and future directions
+
+
Todos/open problems
+
+
Update the analysis of relevant nodes to automata
+ with upward moves to deduce nodes that can be skipped
+
Use the succinct index
+
Add joins (automata product ?)
+
Prove the correctness of the translation (with Matthieu,
+ in Coq)
+
Polynomial complexity relies on the fact that the
+ automaton is ranked (vague notion, only implemented
+ not formalized )
+
Automata now looks like μ-calculus formulæ
+ (historically, I chose automata because I did not know much
+ about logics and automata felt
+ more execution-oriented)
+
Implement static analysis through decision procedure:
+ emptiness, non-reachability etc.
+
+
+
+
Future directions
+
+
Probably want to ditch automata and used formula
+ instead
+
see if cycle-freeness of formula implies the ranking
+ property (I think so)
+
Like for formula, use dedicated moves/modalities for
+ attributes instead of poluting the automaton with the tree
+ encoding
+