∈"> ∉"> Empty" > Any" > ⟦" > ⟧" > L" > R"> ] >
Giuseppe Castagna1 | Hyeonseung Im2 |
Kim Nguyễn3 | Véronique Benzaken3 |
1 CNRS, PPS, Université Paris-Diderot, Paris, France
2 Kangwon National University, Chuncheon, Rep. of Korea
3 LRI, Université Paris-Sud, Orsay, France
W3C standard to query XML documents
declare function get_links($page, $print) {
for $i in $page/descendant::a[not(ancestor::b)]
return $print($i)
}
declare function pretty($link) {
typeswitch($link)
case $l as element(a)
return switch ($l/@class)
case "style1"
return <a href={$l/@href}><b>{$l/text()}</b></a>
default return $l
default return $link
}
let $bold_links := get_links(document("file.xhtml"), $pretty)
It's a pity since XML documents are very precisely typed (DTD, XMLSchemas)
Document type information is validated at runtime rather than checked statically
A polymorphic functional language equipped with semantic subtyping
let pretty ((<a>Any &rarrow; <a>Any) & (Any\<a>Any &rarrow; Any\<a>Any)) =
function
| <a class="style1" href=h ..> l &rarrow; <a href=h>[ <b>l ]
| x &rarrow; x
let get_links (page: <(Any)>Any) (print: <a>Any &rarrow; <a>Any) : [ <a>Any * ] =
match page with
<a>_ & x &rarrow; [ (print x) ]
| < (_\‘b) > l &rarrow;
(transform l with (i & <_>_) &rarrow; get_links i print)
| _ &rarrow; [ ]
Writing functions to traverse documents is painfull
A set &mathT; of types
t ::= b | c | t × t | t &rarrow; t | t &lor; t | t &land; t | t ∖ t | ⊤ | ⊥ | α
b : ranges over basic types (Int, String, …)
c : ranges over singleton types
(`A, 42, …)
Type constructors
Boolean connectives
α : type variables
types are interpreted co-inductively: recursive types and regular
expression types
t1 ≡ (Int × t1) &lor; t2 t2 ≡ (Bool × t2) &lor; (Bool × `nil)
t1 ≡ [ Int* Bool+ ]
t ≤ s &Lrarrow; [t] ⊆ [s]
[ ] interpretation of types as sets of
values
Allows to reason modulo semantic equivalence of type connectives :
[ Int* (Int | Bool*)? ] &land; [ Int+ (Bool+ | Int)* ] ≡ [Int+ Bool*]
The usual sets &mathV; of values:
v ::= 1 | … | `Foo | (v, v) | λx.e
Sequences are nested pairs (à la Lisp):
[ v1 … vn ] ≡ (v1, (…, (vn, `nil)))
XML documents are tagged sequences:
<foo>[ v1 … vn ] ≡ (`foo, [ v1 … vn ])
(Sometimes we write [ ] for the variant `nil)
Everything is built on top of products and variants
(a.k.a. the left-hand side of an arrow in a match … with)
p ::= t | x | (p, p) | p | p | p & p
t ranges over types
x ranges over capture variables
Pair patterns
Alternation |, Intersection &
patterns are also co-inductively interpreted (recursive patterns)
v / p : matching a value against a pattern yields a
substitution from variables to values
&lbag; p &rbag; : the set of values accepted by a
pattern is a type
t / p : matching a type against a pattern yields a
substitution from variables to types
Assume l has type [ Int+ Bool* ], consider:
match l with
[ _* (x & Int) Bool* (y & Bool) ] &rarrow; (x, y)
| [ _* (x & Int) ] &rarrow; (x, `false)
| [ ] &rarrow; (0, `false)
v ::= … | vδ δ ::= &bcirc; | &left;v · δ | &right;v · δ
fst (resp. snd) takes the first (resp. second) projection of a pair and update its zipper accordingly:
v1 ≡ (1, (2, (3, (4, `nil))))&bcirc; v11 ≡ fst v1 ≡ 1&left;(1, (2, (3, (4, `nil))))&bcirc; · &bcirc; v2 ≡ snd v1 ≡ (2, (3, (4, `nil)))&right;(1, (2, (3, (4, `nil))))&bcirc; · &bcirc; v3 ≡ snd v2 ≡ (3, (4, `nil))&right;v2 · &right;v1 · &bcirc;
up returns the head of the zipper:
up v3 ≡ v2 ≡ (2, (3, (4, `nil)))&right;(1, (2, (3, (4, `nil))))&bcirc; · &bcirc;
We extend the type-algebra with zipper types:
t ::= … | tτ τ ::= &bcirc; | &left;t · τ | &right;t · τ | τ &lor; τ | τ ∖ τ | &ztop;
&bcirc;: singleton type denoting the empty zipper (root element)
&ztop;: the top zipper type
Zipper types are interpreted co-inductively
Int(&left;⊤)* &bcirc; type of
integers that are the leftmost descendant of a tree
[ […] […] ]]]>&bcirc; type of
HTML documents
[ … ]]]> &ztop; types of links nested in any context
Since patterns contain types, we can check complex conditions:
Has a descendant <a>_: p ≡ <a>_ &lor; <_>[ _* p _* ] Deos not have an ancestor <b>_: τ ≡ &bcirc; &lor; &right;(⊤∖ <b>_) · τ &lor; &left;(⊤∖ <b>_) · τ
match v with
pτ & x &rarrow; …
| _ &rarrow; …
We want more, namely return all descendants (ancestors,
children, siblings, …) of a node matching a particular condition
Remark: (recursive) patterns already perform a recursive traversal
of the value
Idea: Piggy back on the traversal and accumulate
nodes in special variables
An operator is a 4-tupple (o, no, &rleadsto;o, &rarrow;o), where:
o: is the accumulator name
no: is the arity of o
&rleadsto;o:
&mathV;no &rsarrow; &mathV;, the reduction relation
&rarrow;o:
&mathT;no &rsarrow; &mathT;, the typing relation
An accumulator is a variable (ranged over
by ẋ, ẏ, …) with:
Op(ẋ): an operator
Init(ẋ) ∈ &mathV; : an initial value
v, v' &rleadsto;cons, (v, v')
v, `nil &rleadsto;snoc (v, `nil)
v, (v',v'') &rleadsto;snoc (v', snoc(v,v''))
Now we can use accumulators equipped with cons/snoc in patterns. Instead of matching a single node against a variable, it accumulates that node in sequence (in reverse or in-order).
σ ⊢ v / p &rleadsto; γ, σ'
σ, σ': mapping from accumulators to
values. Initially: σ = { ẋ ↦ Init(ẋ) | ẋ ∈ p }
v: input value
p: pattern
γ: mapping from capture variables to
values
Well known that typing path expressions escapes regular tree languages (i.e. &cduce;'s types). Consider:
t ≡ <c>[ <a>[] t <b>[] ] &lor; <c>[]
The set of all a or b labeled descendants is { [<a>[]n <b>[]n ] | n ≥ 0 } which is not a type.
Intuitively it means that when applying a
recursive pattern against a recursive type, we may generate an
infinite number of distinct types for an accumulator.
We use the typing relation of operators to introduce approximations:
t0, [ (t1 &lor; … &lor; tn)* ] &rarrow;cons [ (t0 &lor; t1 &lor; … &lor; tn)* ]
t0, [ (t1 &lor; … &lor; tn)* ] &rarrow;snoc [ (t0 &lor; t1 &lor; … &lor; tn)* ]
Ensures termination of typechecking of patterns.
Zippers (in values, types, patterns) are a conservative extension
self :: t ≡ (ẋ & t | _ )&ztop; (Init(ẋ) = [], Op(ẋ) = snoc)
child :: t ≡ <_>[ (ẋ & t | _ )* ]&ztop;
Example: applying child::<b>_ to the document
<doc>[ <a>[] <b>[] <c>[] <b>[] ]&bcirc;
<_>[ _ (ẋ & <b>_) _ (ẋ & <b>_)]&ztop;
ẋ↦ [ <b>[]&left;… &right;… &right;… &bcirc; <b>[]&left;… &right;… &right;… &right;… &right;… &bcirc; ]
descendant-or-self:: t ≡ X ≡ ((ẋ & t | _ ) & (<_>[ X * ])&ztop; | _ ) descendant :: t ≡ <_>[ (descendant-or-self::t)* ]&ztop;
We use regular expressions over basic &left;/&right; zippers to encode upward XPath
[ [
[]
[]
[ [] ]
]
]]]>
parent :: t ≡ ⊤ (&left;_) · (&right;_)* · (&right; ẋ & t) · (( (&left; _) · &ztop;) &lor; &bcirc; )
ancestor :: t ≡ ⊤ ( (&left;_) · (&right;_)* · (&right; ẋ & t) )* · &bcirc;
⬆ ⬆ ⬆ ⬆ parent
Still some problems in the on-going implementation
Thank you!