Experiments

In the EMNLP paper we presented several experiments to test the performance of our method. In this page we put the data, models and instruction to follow in order to replicate the experiments. All the material needed is distributed in a zip archive.

download elephant-experiments.zip

Requirements

The software was built and tested on Ubuntu 10.04 and Ubuntu 12.04. On other systems, you may have to adapt the instructions below slightly. To run them, you definitely need

a reasonably recent Unix-like system, e.g. a Linux distribution
a C compiler such as gcc
the C++ compiler g++
Python 2, version 2.7 or higher
for preparing data from the Twente News Corpus, Go

Data

Three dataset are employed for the experiments: the Groningen Meaning Bank for English, the PAISÀ corpus for Italian and the Twente News Corpus for Dutch. The first two corpora are freely redistributable, so we put the data in IOB format in the data directory in the experiments archive. The TwNC must be obtained independently

Assuming that the TwNC-0.2 is downloaded and unpacked in the directory data/ext/TwNC-0.2, and that Go is installed on your system (see requirements above), the following commands will extract and convert the part of the corpus used in the experiments. The command must be issued in the root directory of the extracted experiments archive.

make generated/twnc/database/2000/20000112/ad20000112.iob
make generated/twnc/database/2000/20000112/dd20000112.iob
make generated/twnc/database/2000/20000112/gra20000112.iob
make generated/twnc/database/2000/20000112/nrc20000112.iob
make generated/twnc/database/2000/20000112/parool20000112.iob
make generated/twnc/database/2000/20000112/trouw20000112.iob
make generated/twnc/database/2000/20000112/volkskrant20000112.iob
make generated/twnc/database/2000/20000122/ad20000122.iob
make generated/twnc/database/2000/20000122/dd20000122.iob
make generated/twnc/database/2000/20000122/nrc20000122.iob
make generated/twnc/database/2000/20000122/parool20000122.iob
make generated/twnc/database/2000/20000122/trouw20000122.iob
make generated/twnc/database/2000/20000122/vnl20000122.iob
make generated/twnc/database/2000/20000122/volkskrant20000122.iob

for file in generated/twnc/database/2000/200001*2/*.iob; do cat $file; echo; done > data/dutch.iob 
./src/scripts/seqsplit.py data/dutch.iob

Feature sets

To get the results published in Table 2 Error rates obtained with different feature sets. and Table 3 Using different context window sizes, a GNU Make makefile is provided. The name of the target should have the format generated/${DATASET}.${SPLIT}.${FEATURESET}${WINDOWSIZE}.eval so for instance the command

$ make generated/dutch.dev.codecat9.eval

produces the results for the dev subset of the Dutch dataset, using both Unicode character codes and categories as features, with a window size of 9. The content of the file contains information about the errors

$ cat generated/dutch.dev.codecat9.eval
Annotated units: 489291
Errors:          774
Error rate:      0.001582

  I      T     O     S   
I 328099   498     0    2
T    237 80856     0   20
O      0     0 75234    0
S      3    14     0 4328

  fp  fn  tp     prec           rec            f1            
I 240 500 328099 0.999269048148 0.998478388553 0.998873561889
T 512 257  80856 0.993707600039 0.996831580634 0.995267138927
O   0   0  75234            1.0            1.0            1.0
S  22  17   4328 0.994942528736 0.996087456847   0.9955146636

To get the results published in Table 4 Results obtained using different context window sizes and addition of SRN features, you first need to build the tools elman and wapiti, and make sure the required programs and scripts are on your PATH, like this:

$ make bin/elman bin/wapiti
$ export PATH=`pwd`/bin:`pwd`/src/scripts-srn:$PATH

Then change into the experiments-srn subdirectory:

$ cd experiments-srn

Here, another makefile is provided. The name of the target should have the format ${DATASET}/1.0/${FEATURESET}${WINDOWSIZE}-top10/${SPLIT}.eval so for instance the command

$ make dutch/1.0/codecat9-top10/dev.eval

produced the results for the dev subset of the Dutch dataset, using both Unicode character codes and categories as features, with a window size of 9. The content of the file contains information about the errors:

$ cat dutch/1.0/codecat9-top10/dev.eval
processed 489291 tokens with 414057 phrases; found: 414057 phrases; correct: 413922.
accuracy:  99.97%; precision:  99.97%; recall:  99.97%; FB1:  99.97
                I: precision:  99.99%; recall:  99.98%; FB1:  99.98  328560
                S: precision:  99.77%; recall:  99.70%; FB1:  99.74  4342
                T: precision:  99.89%; recall:  99.94%; FB1:  99.92  81155
  135   489291 0.000276