A Core Calculus for XQuery 3.0

Combining Navigational and Pattern-Matching Approaches

Giuseppe Castagna¹	Hyeonseung Im²
Kim Nguyễn³	Véronique Benzaken³

1 CNRS, PPS, Université Paris-Diderot, Paris, France
2 Kangwon National University, Chuncheon, Rep. of Korea
3 LRI, Université Paris-Sud, Orsay, France

XQuery 3.0

W3C standard to query XML documents


   declare function get_links($page, $print) {
       for $i in $page/descendant::a[not(ancestor::b)]
       return $print($i)
   }

   declare function pretty($link) {
       typeswitch($link)
       case $l as element(a)
            return switch ($l/@class)
                   case "style1"
                     return <a href={$l/@href}><b>{$l/text()}</b></a>
                   default return $l

       default return $link
   }
   
   let $bold_links := get_links(document("file.xhtml"), $pretty)

XQuery 3.0

Pros
+ standardized
+ nice declarative syntax for paths
Cons
- sometime tedious to extract subtrees while preserving the structure
- ~~no typechecking for functions (typechecking is optional in 3.0)~~

It's a pity since XML documents are very precisely typed (DTD, XMLSchemas)

Document type information is validated at runtime rather than checked statically

&cduce;

A polymorphic functional language equipped with semantic subtyping

   
  let pretty ((<a>Any &rarrow; <a>Any)  &  (Any\<a>Any &rarrow; Any\<a>Any)) =
      function
    | <a class="style1" href=h ..> l &rarrow; <a href=h>[ <b>l ]
    | x &rarrow; x


  let get_links (page: <(Any)>Any) (print: <a>Any &rarrow; <a>Any) : [ <a>Any * ] =

      match page with
      <a>_ & x &rarrow; [ (print x) ]
    | < (_\‘b) > l &rarrow;
                 (transform l with (i & <_>_) &rarrow; get_links i print)
    | _ &rarrow; [ ]

&cduce;

Pros
+ Statically typed
+ compact (and efficient) type and value pattern-matching
Cons
- ~~complex navigation encoded through explicit recursion~~
- no type inference for functions

Writing functions to traverse documents is painfull

This work

Add support for path navigation to &cduce;
- Enrich the type algebra with zippers (à la Huet)
- Extend pattern-matching construct to zipped values and types
- Encode path expressions as recursive patterns
Perform a type-directed translation from XQuery to &cduce;

&cduce;'s type algebra

A set &mathT; of types

    t ::=  b  |  c  |  t × t  |  t &rarrow; t  |  t &lor; t  |  t  &land; t  |  t ∖ t  |  ⊤  |  ⊥  |  α

b : ranges over basic types (Int, String, …)
c : ranges over singleton types (`A, 42, …)
Type constructors
Boolean connectives
α : type variables
types are interpreted co-inductively: recursive types and regular expression types

      t₁ ≡ (Int × t₁)    &lor;    t₂
      t₂ ≡ (Bool × t₂)  &lor; (Bool × `nil)

 t₁ ≡ [ Int* Bool+ ]

Semantic subtyping

t ≤ s   &Lrarrow;   [t] ⊆  [s]

[ ] interpretation of types as sets of values
Allows to reason modulo semantic equivalence of type connectives :

      [ Int* (Int | Bool*)? ] &land; [ Int+ (Bool+ | Int)* ] ≡ [Int+ Bool*]

&cduce; data-model

The usual sets &mathV; of values:

       v ::= 1  |  …  |  `Foo  |   (v, v)  |  λx.e

Sequences are nested pairs (à la Lisp):

[ v₁  … v_n ] ≡ (v₁, (…, (v_n, `nil)))

XML documents are tagged sequences:

<foo>[ v₁  … v_n ] ≡ (`foo, [ v₁  … v_n ])

(Sometimes we write [ ] for the variant `nil)

Everything is built on top of products and variants

&cduce; patterns

(a.k.a. the left-hand side of an arrow in a match … with)

    p ::=  t  | x | (p, p) |  p | p  |  p & p

t ranges over types
x ranges over capture variables
Pair patterns
Alternation |, Intersection &
patterns are also co-inductively interpreted (recursive patterns)

v / p : matching a value against a pattern yields a substitution from variables to values
&lbag; p &rbag; : the set of values accepted by a pattern is a type
t / p : matching a type against a pattern yields a substitution from variables to types

&cduce; patterns (example)

Assume l has type [ Int+ Bool* ], consider:


       match l with
       [ _* (x & Int) Bool* (y & Bool) ] &rarrow;  (x, y)
    |  [ _* (x & Int) ]                  &rarrow;  (x, `false)
    |  [ ]                               &rarrow;  (0, `false)

&lbag;[ _* (x & Int) Bool* (y & Bool) ]&rbag; ≡ [ ⊤* Int Bool+ ]
{ x ↦ Int, y ↦ Bool }
&lbag;[ _* (x & Int) ]&rbag; ≡ [ ⊤* Int ]
{ x ↦ Int }
Since [Int+ Bool* ] ∖ ( [ ⊤* Int Bool+ ] &lor; [ ⊤* Int]) ≡ ⊥
the third case is unreachable.

Zippers (1/2)

Introduced in 1997 by Gérard Huet
Stack of visited nodes
Push the current node on the stack when traversing a pair
Take the top of the stack to go backward
Tag the elements of the stack to remember which component of a pair we have visited

 v ::=  …  |  v_δ
 δ ::=  &bcirc;  | &left;v · δ | &right;v · δ

Zippers (2/2)

fst (resp. snd) takes the first (resp. second) projection of a pair and update its zipper accordingly:

      v₁ ≡ (1, (2, (3, (4, `nil))))_&bcirc;
      v₁₁ ≡ fst v₁ ≡ 1_{&left;(1, (2, (3, (4, `nil))))_&bcirc; · &bcirc;}
      v₂ ≡ snd v₁ ≡ (2, (3, (4, `nil)))_{&right;(1, (2, (3, (4, `nil))))_&bcirc; · &bcirc;}
      v₃ ≡ snd v₂ ≡ (3, (4, `nil))_{&right;v₂ · &right;v₁ · &bcirc;}

up returns the head of the zipper:

      up v₃ ≡ v₂ ≡ (2, (3, (4, `nil)))_{&right;(1, (2, (3, (4, `nil))))_&bcirc; · &bcirc;}

Zipper types

We extend the type-algebra with zipper types:

 t ::=  …  |  t_τ
 τ ::=  &bcirc;  |  &left;t · τ  | &right;t · τ  |  τ &lor; τ  |  τ ∖ τ  |  &ztop;

&bcirc;: singleton type denoting the empty zipper (root element)
&ztop;: the top zipper type
Zipper types are interpreted co-inductively

Int_{(&left;⊤)* &bcirc;} type of integers that are the leftmost descendant of a tree

[ […] […] ]]]>_&bcirc; type of HTML documents

[ … ]]]>_&ztop; types of links nested in any context

Tree navigation

Since patterns contain types, we can check complex conditions:

   Has a descendant <a>_:
     p ≡ <a>_   &lor;   <_>[ _* p _* ]
 
  Deos not have an ancestor <b>_:
    τ ≡ &bcirc;   &lor;   &right;(⊤∖ <b>_) · τ   &lor;   &left;(⊤∖ <b>_) · τ


    match v with
       p_τ & x &rarrow; …
   | _        &rarrow; …

We want more, namely return all descendants (ancestors, children, siblings, …) of a node matching a particular condition

Remark: (recursive) patterns already perform a recursive traversal of the value
Idea: Piggy back on the traversal and accumulate nodes in special variables

Operators and Accumulators

An operator is a 4-tupple (o, n_o, &rleadsto;_o, &rarrow;_o), where:

o: is the accumulator name
n_o: is the arity of o
&rleadsto;_o: &mathV;^n_o &rsarrow; &mathV;, the reduction relation
&rarrow;_o: &mathT;^n_o &rsarrow; &mathT;, the typing relation

An accumulator is a variable (ranged over by ẋ, ẏ, …) with:

Op(ẋ): an operator
Init(ẋ) ∈ &mathV; : an initial value

Some operators

    v, v' &rleadsto;^cons, (v, v') 

    v, `nil &rleadsto;^snoc (v, `nil)

    v, (v',v'') &rleadsto;^snoc (v', snoc(v,v''))

Now we can use accumulators equipped with cons/snoc in patterns. Instead of matching a single node against a variable, it accumulates that node in sequence (in reverse or in-order).

Pattern matching semantics (simplified)

  σ ⊢ v / p &rleadsto; γ, σ'

σ, σ': mapping from accumulators to values. Initially: σ = { ẋ ↦ Init(ẋ) | ẋ ∈ p }
v: input value
p: pattern
γ: mapping from capture variables to values

v ∈ [ t ] σ; δ ⊢ v / t &rleadsto; ∅, σ

(type)

σ ⊢ v / ẋ &rleadsto; ∅, σ[ ẋ := Op(ẋ) (v, σ(ẋ)) ]

(acc)

σ ⊢ v / x &rleadsto; { x ↦ v }, σ

(var)

σ ⊢ (fst v)/p₁ &rleadsto; γ₁, σ' σ' ⊢ (snd v)/p₂ &rleadsto; γ₂, σ'' σ ⊢ v / (p₁, p₂) &rleadsto; γ₁∪ γ₂, σ''

(pair) Remember, if v ≡ (v1,v2)_δ then fst v ≡ v_{1 &left;v · δ} and snd v ≡ v_{2 &right;v · δ}
(some other rules for alternation, failure, recursion, etc.)

Typing of patterns (with accumulators) 1/2

Well known that typing path expressions escapes regular tree languages (i.e. &cduce;'s types). Consider:

      t ≡ <c>[ <a>[] t <b>[] ]    &lor;   <c>[]

The set of all a or b labeled descendants is { [<a>[]ⁿ <b>[]ⁿ ] | n ≥ 0 } which is not a type.

Intuitively it means that when applying a recursive pattern against a recursive type, we may generate an ~~infinite number of distinct types~~ for an accumulator.

Typing of patterns (with accumulators) 2/2

We use the typing relation of operators to introduce approximations:

    t₀, [ (t₁ &lor; … &lor; t_n)* ] &rarrow;^cons [ (t₀ &lor; t₁ &lor; … &lor; t_n)* ] 

    t₀, [ (t₁ &lor; … &lor; t_n)* ] &rarrow;^snoc [ (t₀ &lor; t₁ &lor; … &lor; t_n)* ]

Ensures termination of typechecking of patterns.

Results

Zippers (in values, types, patterns) are a conservative extension

Subtyping and typechecking are extended straightforwardly
Typing of patterns introduces sound approximations only for accumulators
Provided the operators are sound, the whole language remains type-safe

Downward XPath axes

     self :: t ≡    (ẋ & t | _ )_&ztop;                                (Init(ẋ) = [], Op(ẋ) = snoc)

     child :: t ≡  <_>[ (ẋ & t | _ )* ]_&ztop;

Example: applying child::<b>_ to the document

      <doc>[ <a>[]    <b>[]    <c>[]    <b>[] ]_&bcirc;
        <_>[   _    (ẋ & <b>_)   _     (ẋ & <b>_)]_&ztop;

         ẋ↦ [ <b>[]_{&left;… &right;… &right;… &bcirc;}    <b>[]_{&left;… &right;… &right;… &right;… &right;… &bcirc;}   ]

     descendant-or-self:: t ≡   X ≡ ((ẋ & t | _ )  &  (<_>[ X * ])_&ztop; | _ )

     descendant :: t ≡ <_>[ (descendant-or-self::t)* ]_&ztop;

Binary-tree encoding

We use regular expressions over basic &left;/&right; zippers to encode upward XPath

   [ [
          []
          []
          [  [] ]
        ]
   ]]]>

A Core Calculus for XQuery 3.0

Combining Navigational and Pattern-Matching Approaches

XQuery 3.0

XQuery 3.0

&cduce;

&cduce;

This work

&cduce;'s type algebra

Semantic subtyping

&cduce; data-model

&cduce; patterns

&cduce; patterns (example)

Zippers (1/2)

Zippers (2/2)

Zipper types

Tree navigation

Operators and Accumulators

Some operators

Pattern matching semantics (simplified)

Typing of patterns (with accumulators) 1/2

Typing of patterns (with accumulators) 2/2

Results

Downward XPath axes

Binary-tree encoding

Upward XPath axes

Other results

Conclusion, thoughts and future work