1 <?xml version="1.0" encoding="utf-8" ?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
5 <!ENTITY in "<small style='font-size:small'>∈</small>">
6 <!ENTITY notin "<small style='font-size:small'>∉</small>">
10 <html xmlns="http://www.w3.org/1999/xhtml" >
12 <title>Tatoo : the path forward</title>
14 <meta http-equiv="Content-Type"
15 content="text/html; charset=utf-8" />
16 <meta name="copyright"
17 content="Copyright © 2013 Kim Nguyễn" />
20 <script src="../jquery-2.0.3.min.js" type="text/javascript" ></script>
21 <script src="../libs/raphael-min.js" type="text/javascript" ></script>
22 <!-- Load the library -->
23 <script src="../simpleWebSlides.js" type="text/javascript" ></script>
25 <link rel="stylesheet" href="../simpleWebSlides.css" type="text/css" media="all" />
26 <!-- Load a custom Theme, the class-element marks this style-sheet
27 a "theme" that can be swtiched dynamicaly -->
28 <link class="sws-theme" rel="stylesheet" title="U-Psud style" href="../themes/uPsud.css" type="text/css" />
30 <!-- Customize some templates and initialize -->
32 <script type="text/javascript">
33 SWS.Config['sws-slide-change'] = SWS.Effects.slideChangeFadeOutIn;
34 SWS.Config['sws-object-deactivate'] = SWS.Effects.objectDeactivateFadeOut;
35 SWS.Config['sws-object-activate'] = SWS.Effects.objectActivateFadeIn;
37 //Ensures that we load SWS at the very end, after MathJax has
40 $(window).load(SWS.Presentation.init);
44 <div class="sws-slide sws-cover sws-option-nofooter">
45 <h1>Tatoo : the path forward</h1>
46 <h3>TYPEX meeting, Grenoble April 14-16 2014</h3>
47 <a href="mailto:kn@lri.fr">kn@lri.fr</a><br/>
48 <a href="http://www.lri.fr/~kn/">http://www.lri.fr/~kn</a>
51 <h1>XPath in nutshell</h1>
52 <div class="sws-slide">
54 <p>W3C <em>query</em> (sub-)language. Allows one to
55 denote a <em>set of nodes</em> in a document</p>
57 /<span class="step"><span class="axis">descendant</span>::<span class="test">M</span></span>/<span class="step"><span class="axis">ancestor</span>::<span class="test">A</span><span class="filter">[ <span class="axis">descendant</span>::<span class="test">P</span> or <span class="axis">parent</span>::<span class="test">Q </span>]</span></span>/<span class="step"><span class="axis">child</span>::<span class="test">*</span></span></code>
59 <li class="sws-pause">A path is made of several
60 steps <span style="color:green;">steps</span></li>
61 <li > A step is made of :
63 <li class="sws-pause">An <span style="color:orange;">axis</span> amongst
64 : <tt>self</tt>, <tt>child</tt>, <tt>parent</tt>, <tt>descendant</tt>, <tt>ancestor</tt>, <tt>descendant-or-self</tt>, <tt>ancestor-or-self</tt>, <tt>following-sibling</tt>, <tt>preceding-sibling</tt>,
66 <li class="sws-pause">A <span style="color:purple">node test</span> : label
67 name, <tt>*</tt>, <tt>text()</tt>, <tt>node()</tt>, …</li>
68 <li class="sws-pause"> An optional <span style="color:#88f">filter</span> : an XPath expression whose
69 truth value determine is the node is kept</li>
73 <script type="text/javascript">
75 var col_change = function (sel, col) {
76 return function (canvas) {
77 canvas.find(".xpath *").css("color","");
78 var objs = canvas.find(sel);
79 objs.css("color", col);
82 var reg = SWS.Presentation.registerCallback;
83 reg ("0", col_change(".xpath *",""));
84 reg ("1", col_change(".step","green"));
85 reg ("2", col_change(".axis","orange"));
86 reg ("3", col_change(".test","purple"));
87 reg ("4", col_change(".filter","#88f"));
92 <div class="sws-slide">
93 <h1>Filters (or predicates)</h1>
94 <p>An <em>arbitrary expression</em> whose value is cast into a
95 Boolean (<tt>true</tt> keeps the selected node, <tt>false</tt>
99 //A[ child::B and .//* = count(/descendant::X) mod 12 ]
102 <li><tt>*</tt> is an alias for <tt>child::*</tt></li>
103 <li><tt>//</tt> is an alias for <tt>/descendant-or-self::node()/</tt></li>
104 <li><tt>.</tt> is an alias for <tt>self::node()</tt></li>
106 <p class="sws-pause" style="background:white">BTW: XPath semantics
107 is <span style="text-decoration:line-through">crazy</span> <em>rich</em>. In
108 the example we want all <tt>A</tt> nodes (ok) that have
109 a <tt>B</tt> child (ok) and <span class="sws-pause">that have a
111 </span><span class="sws-pause">for which the (string) concatenation
112 of its text descendants (?) is equal to the number
113 of <tt>X</tt> elements in the document <tt>mod 12</tt>
117 <h1>Tatoo : what is it ? how does it work ?</h1>
118 <div class="sws-slide">
120 <p>Succint XML Self-Index: (limited) XPath query engine developped during my
121 postdoc at Nicta. It uses:
123 <ul><li><em>Self-Index</em> developped by Mäkinen, Sirén, Valimaki :
124 stores a text <tt>T</tt> of <tt>N</tt> characters (in an
125 alphabet of size <tt>σ</tt>) in <tt>N·H<sub>k</sub>(T)
126 + o(N·log σ) bits</tt>.<br/>
127 Can answer queries like « <i>first position after i that
128 matches a constant string <tt>s</tt></i> » in <tt>O(|s|·log σ)</tt>
130 <li><em>Succinct tree index</em> of Arroyuello, Claude, Navarro :
131 stores a tree of size <tt>M</tt> with labels in an alphabet of
132 size <tt>θ</tt> in <tt>4M(2 + log θ) + o(M) bits</tt>. <br/>
133 Full navigation + jumping from a node to the next one with a
134 given label (and also LCA of two nodes) in O(1).
136 <li><em>Non det. alternating tree-automata</em> of Maneth and
138 fly determinization and approximation of the sets of relevant
139 nodes (worst case <tt>O(M·|A|)</tt> to evaluate an automaton
143 <p style="background:white">Can answer downward (no ancestor, parent, …) XPath with text
144 expressions of the form (<tt>f(., <i>constant</i>)</tt>,
145 with <tt>f ∈ { contains, starts-with, ends-with, =, <, > ,
146 <=, >= }</tt>). Translation from XPath to automata is
149 <div class="sws-slide">
150 <h1>Pros and cons of SXSI</h1>
152 <ul><li>For the considered subset: <em>much faster</em> than state of the
153 art optimized XML DB</li>
154 <li>Very <em>small space overhead</em></li>
155 <li>Theoreticall sound formalism, <em>proven worst-case
159 <ul><li>The subset is <s>too restricted</s> (no upward axes, no
160 arbitrary joins)</li>
161 <li> Encoding was <s>wrong</s> in subtle ways (e.g. in some corner
162 cases with double negations in filters)
167 <div class="sws-slide">
168 <h1>Tree Automata TOOlkit</h1>
169 <p>Ré-implmentation of the automata part of SXSI</p>
171 <li>Supports <em>full navigational XPath</em> (upward axis)</li>
172 <li>Retain polynomial (combined complexity) running
173 time, <i>i.e.</i> <tt>O(|Doc|·|Q|)</tt> for an XPath
178 <div class="sws-slide">
179 <h1>How does it work ?</h1>
180 <p>Sample document: <tt>alphabet.xml</tt>
182 <iframe src="alphabet.xhtml" style="width:99%;height:320pt;border-style:none;"
185 <div class="sws-slide">
186 <h1>Example 1 : <tt>/descendant::M</tt></h1>
187 <iframe src="ex1.html" style="margin:0pt 5% 0pt 5%;width:90%;height:320pt;border-style:none;"
190 <div class="sws-slide">
192 <code> /descendant::M/ancestor::A[ descendant::P or parent::Q ]/child::*</code>
193 <iframe src="ex2.html" style="margin:0pt 5% 0pt 5%;width:90%;height:320pt;border-style:none;"
197 <h1>Open problems, todos and future directions</h1>
198 <div class="sws-slide">
199 <h1>Todos/open problems</h1>
201 <li>Update the analysis of <em>relevant nodes</em> to automata
202 with upward moves to deduce nodes that can be skipped</li>
203 <li>Use the succinct index</li>
204 <li>Add joins (automata product ?)</li>
205 <li>Prove the correctness of the translation (with Matthieu,
207 <li>Polynomial complexity relies on the fact that the
208 automaton is <em>ranked</em> (vague notion, only implemented
209 not formalized )</li>
210 <li>Automata now looks like <em>μ-calculus</em> formulæ
211 (historically, I chose automata because I did not know much
212 about logics and automata <i>felt</i>
213 more <em>execution-oriented</em>)</li>
214 <li>Implement static analysis through decision procedure:
215 emptiness, non-reachability etc.</li>
218 <div class="sws-slide">
219 <h1>Future directions</h1>
221 <li>Probably want to ditch automata and used formula
223 <li>see if cycle-freeness of formula implies the ranking
224 property (I think so) </li>
225 <li>Like for formula, use dedicated moves/modalities for
226 attributes instead of poluting the automaton with the tree
229 <li>Implement decision procédure using automata techniques,
230 use performances of the super-solver™ as a goal
232 <li>Define aproximate, lineartime (on the fly) analyses to
233 perform more efficient evaluation
235 <li class="sws-pause">Your item of choice here</li>