An optional filter : an XPath expression whose
truth value determine is the node is kept
Filters (or predicates)
An arbitrary expression whose value is cast into a
Boolean (true keeps the selected node, false
discards it)
Example :
//A[ child::B and .//* = count(/descendant::X) mod 12 ]
* is an alias for child::*
// is an alias for /descendant-or-self::node()/
. is an alias for self::node()
BTW: XPath semantics
is crazyrich. In
the example we want all A nodes (ok) that have
a B child (ok) and that have a
descendant (hum… )
for which the (string) concatenation
of its text descendants (?) is equal to the number
of X elements in the document mod 12
(whaaat ?)
Tatoo : what is it ? how does it work ?
SXSI
Succint XML Self-Index: (limited) XPath query engine developped during my
postdoc at Nicta. It uses:
Self-Index developped by Mäkinen, Sirén, Valimaki :
stores a text T of N characters (in an
alphabet of size σ) in N·Hk(T)
+ o(N·log σ) bits.
Can answer queries like « first position after i that
matches a constant string s » in O(|s|·log σ)
Succinct tree index of Arroyuello, Claude, Navarro :
stores a tree of size M with labels in an alphabet of
size θ in 4M(2 + log θ) + o(M) bits.
Full navigation + jumping from a node to the next one with a
given label (and also LCA of two nodes) in O(1).
Non det. alternating tree-automata of Maneth and
myself, on the
fly determinization and approximation of the sets of relevant
nodes (worst case O(M·|A|) to evaluate an automaton
of size A).
Can answer downward (no ancestor, parent, …) XPath with text
expressions of the form (f(., constant),
with f ∈ { contains, starts-with, ends-with, =, <, > ,
<=, >= }). Translation from XPath to automata is
linear in size
Pros and cons of SXSI
Pros
For the considered subset: much faster than state of the
art optimized XML DB
The subset is too restricted (no upward axes, no
arbitrary joins)
Encoding was wrong in subtle ways (e.g. in some corner
cases with double negations in filters)
Tree Automata TOOlkit
Ré-implmentation of the automata part of SXSI
Supports full navigational XPath (upward axis)
Retain polynomial (combined complexity) running
time, i.e.O(|Doc|·|Q|) for an XPath
query Q.
How does it work ?
Sample document: alphabet.xml
Example 1 : /descendant::M
Example 2
/descendant::M/ancestor::A[ descendant::P or parent::Q ]/child::*
Open problems, todos and future directions
Todos/open problems
Update the analysis of relevant nodes to automata
with upward moves to deduce nodes that can be skipped
Use the succinct index
Add joins (automata product ?)
Prove the correctness of the translation (with Matthieu,
in Coq)
Polynomial complexity relies on the fact that the
automaton is ranked (vague notion, only implemented
not formalized )
Automata now looks like μ-calculus formulæ
(historically, I chose automata because I did not know much
about logics and automata felt
more execution-oriented)
Implement static analysis through decision procedure:
emptiness, non-reachability etc.
Future directions
Probably want to ditch automata and used formula
instead
see if cycle-freeness of formula implies the ranking
property (I think so)
Like for formula, use dedicated moves/modalities for
attributes instead of poluting the automaton with the tree
encoding
Implement decision procédure using automata techniques,
use performances of the super-solver™ as a goal
Define aproximate, lineartime (on the fly) analyses to
perform more efficient evaluation