The Groningen Meaning Bank (GMB) consists of public domain English texts with corresponding syntactic and semantic representations. The GMB is developed at the University of Groningen.
The GMB supports deep semantics, opening the way to theoretically grounded, data-driven approaches to computational semantics. It integrates phenomena instead of covering single phenomena in isolation. This provides a better handle on explaining dependencies between various ambiguous linguistic phenomena, including word senses, thematic roles, quantifier scrope, tense and aspect, anaphora, presupposition, and rhetorical relations. In the GMB texts are annotated rather than isolated sentences, which provides a means to deal with ambiguities on the sentence level that require discourse context for resolving them.
The GMB is being built using a bootstrapping approach. We employ state-of-the-art NLP tools (notably the C&C tools and Boxer) to produce a reasonable approximation to gold-standard annotations. From release to release, the annotations are corrected and refined using human annotations coming from two main sources: experts who directly edit the annotations in the GMB via the Explorer, and non-experts who play a game with a purpose called Wordrobe.
The theoretical backbone for the semantic annotations in the GMB is established by Discourse Representation Theory (DRT), a formal theory of meaning developed by the philosopher of language Hans Kamp (Kamp, 1981; Kamp and Reyle, 1993). Extensions of the theory bridge the gap between theory and practice. In particular, we use VerbNet for thematic roles, a variation on ACE's named entity classification, WordNet for word senses and Segmented DRT for rhetorical relations (Asher and Lascarides, 2003). Thanks to the DRT backbone, all these linguistic phenomena can be expressed in a first-order language, enabling the practical use of first-order theorem provers and model builders.