-Compilation instruction:
+Dependencies
+------------
-compile the TextCollection :
+libxml2.6++-dev
+ocaml-findlib
+ocaml-ulex
+ocaml-nox
+camlp4
-cd XMLTree/TextCollection
-make
+and their dependencies as well as ocamlbuild which is part of the standard ocaml distribution.
+The "src" directory should contain a copy of the XMLTree source dir (this can be a symbolic to
+the XMLTree directory).
-compile libcds :
+Buid instructions
+-----------------
-cd XMLTree/libcds
-make
+Build the XMLTree library:
-compile XMLTree :
-cd XMLTree
-make
+cd src/XMLTree; make clean all; cd -
-compile xpathcomp/
-make
+Build the xpathcomp program:
+./configure
+./build
-you can compile with make DEBUG=true to get more statistics the overall programm is slower
-but you get precise timing for each individual function calls to the XMLTree interface.
+See ./build --help to customize the build process (build with debugging/profiling code, build ocaml
+bytecode instead of native binary etc...).
+This will produce a main.native executable (a symlink to the real binary which lies in the _build/src
+directory).
-Usage :
-./main 'file.xml' 'query' [output]
+Usage
+-----
+./main.native [options] 'input_file' 'query' [output_file]
-file.xml is the input file. There are some in the tests subdirectory. At the moment you can mainly test
-with base.xml and tiny.xml. Any file bigger than this will take too much time to load.
+input_file can be either an xml file (and thus the name must have a .xml extension) or an indexed file
+(with a .srx extension). Due to a bug in the parser, the query can only use the explicit syntax, that
+is:
+/descendant::a/child::b[ child::a and descendant::c or not(contains( ., "str")) ]/descendant::d
-output is optional. If specified, it is the name of a file to which the result of the query is serialized.
-If output is not given, the the result of the query is kept as a set of identifier and no access to
-the string collection is made.
+and no // or a/b/c. Text predicates must be of the form function( ., "string") where function can
+be: contains, equals, starts-with, ends-with.
+Available options are:
-There are a few flags:
-
--f sample factor [default=64]
--i index empty texts [default=false]
--d Disable text collection[default=false]
--help Display this list of options
---help Display this list of options
-
-
-for instance:
-./main -f 29 -d tests/tiny.xml '//para'
+ -c counting only (don't materialize the result set)
+ -f sample factor [default=64]
+ -i index empty texts [default=false]
+ -d disable text collection[default=false]
+ -s save the intermediate representation into file.srx
+ -b real bottom up run
+ -nj disable jumping
+ -index-type {default|swcsa|rlcsa} choose text index type
+ -v verbose mode
+ -help Display this list of options
+ --help Display this list of options
\ No newline at end of file