Here you can download the release of elephant. Included in the distribution there are the models for sentence and word boundary detection of English, Dutch and Italian. The model for English is a snapshot of the model used for the tokenisation of the Groningen Meaning Bank taken on 2013 September 6th. The model for Dutch is trained on the same data as presented in the paper, but has been improved with additional n-gram features. The model for Italian is the best-performing model according to the experiments presented in the paper.
To install elephant simply type
$ make ; make installthis will compile the external tools wapiti and elman and copy the executables files in /usr/local/bin . To change the destination directory the variable PREFIX in the Makefile has to be edited. After installation, elephant is invoked like in these examples: (PTB-style output)
$ echo 'Good morning Mr. President.' | elephant -m models/english(IOB output format)
$ echo 'Good morning Mr. President.' | elephant -m models/english -f iobIt is also possible to run elephant from the source directory without need to install it, by just typing
$ makeand invoking the executable from the current directory, e.g.
$ echo 'Good morning Mr. President.' | ./elephant -m models/english/ Good morning Mr. President .The -t iob options makes elephant output a double column format. Each line represents one character, the first column is its Unicode codepoint the second is its assigned label.
$ echo 'Good morning Mr. President.' | ./elephant -m models/english/ -f iob 71 S 111 I 111 I 100 I 32 O 109 T 111 I 114 I 110 I 105 I 110 I 103 I 32 O 77 T 114 I 46 I 32 O 80 T 114 I 101 I 115 I 105 I 100 I 101 I 110 I 116 I 46 T 10 O
Elephant makes use of the wapiti sequence labelling toolkit. The source code of wapiti is included in the elephant distribution and is compiled automatically. It is also possible to compile only wapiti by typing
$ make wapiti
A statistical model for elephant is a directory containing two files named respectively wapiti and elman. The current release of elephant is bundled with three ready-to-use models. We selected the best performing models according to our experiments, that is, Cat-Code-7-SRN for English and Dutch and Cat-Code-11-SRN for Italian. Additional details can be found in the paper.
Included in the bundle there is a script to facilitate the training of new models. The script takes as input tokenized text in IOB format and a wapiti pattern file.
$ ./elephant-train usage: elephant-train [-h] -m MODEL_DIR [-e ELMAN_MODEL] -w WAPITI_PATTERN_FILE -i INPUT_IOB_FILE [-d DEVEL_IOB_FILE]
Elephant is licenced under the term of the two-clause BSD Licence:
Copyright (c) 2009-2013 CNRS All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The source code is available at .