Generative Lexicon Theory: Integrating Theoretical and Empirical Methods


What: Summer Course at NASSLLI 2016

When and Where: July 11-15, 2016: Rutgers, New Brunswick, N.J.

Who: James Pustejovsky and Elisabetta Ježek

Material: Slides and Articles from the Course (here)


In this tutorial we present an introduction to Generative Lexicon Theory. The overall aim is to acquaint the student with the basic assumptions and components of the theory and motivate theoretical decisions through evidence-based analysis over large linguistic datasets. We show how the theory models the interaction between lexical information and other components of grammar, in particular how it mediates various problems in the mapping from lexical semantic representations to syntactic forms and, to a lesser extent, to pragmatic interpretation. From a computational perspective, we highlight the applicability of GL to natural language processing tasks such as Qualia extraction, compound classification, event identification, and metonymy resolution.

GL theory was conceived from the outset as an infrastructure for a lexically-based semantic theory of language, founded on a rich compositional procedure integrating mechanisms for modulation of word meaning in context. It has won widespread acceptance among linguists and computer scientists of different theoretical backgrounds, and established itself as a productive and typologically adequate paradigm for linguistic research in a wide number of areas, such as event semantics, theory of argument structure, lexical and computational semantics, and Natural Language Processing.

The original full statement of the theory was presented in Pustejovsky (1995), but there have been significant developments since then, including the elaboration of a general theory of semantic selection and semantic typing (Asher and Pustejovsky 2006, Pustejovsky 2011), which have enhanced the explanatory power of the theory and extended its coverage of linguistic phenomena. Moreover, since 2000 the theory has drawn increasingly on the findings of corpus linguistics and distributional semantic analysis and procedures (Pustejovsky and Jezek, 2008, Pustejovsky and Rumshisky, 2008, Jezek and Quochi, 2010, Jezek and Vieu, 2014). This has created a new dimension of evidence-based analysis and interpretation, giving rise to an integration of empirical analysis and theoretical modeling. For all these reasons, an introductory tutorial, illustrating how GL principles can be put into practice in linguistic analysis, will benefit students and researchers interested in both theoretical linguistics and computational semantics.

The plan of the course is as follows. In the first lecture, we review the motivations behind GL and the notion of a distributed compositional model of language meaning. We sketch out the basic assumptions underlying GL theory and justify these assumptions in general terms. For lecture two, we examine the qualia structure and its role in differentiating the semantic micro-structure of word meaning. Lecture three focuses on argument distinctions and argument typing and examines default realization of the different types of arguments. Lecture four examines event structure, and discusses event type shiftings as attested in the corpus. Finally, in lecture five, we look in detail at GL’s compositional mechanisms of coercion and co-composition. We situate this last lecture in the context of data from large linguistic corpora, and investigate the computational consequences of the GL architecture for modeling compositionality and determining meaning in context.

There will be labs associated with the lectures, relating to corpus evidence and analytics relating to qualia extraction, compound interpretation, coercion, and event typing.


1. Introduction to GL
1.1. Basic Assumptions
1.2. Polysemy and Ambiguity in GL
1.3. Basic GL Concepts
1.4 GL Notational Language

2. Qualia Structure
2.1. Basic Assumptions
2.2. Qualia Roles
2.3. Conventionalized Attributes

  • Lab on Qualia identification and extraction
  • Lab on Qualia relation tagging in compounds

3. Argument Structure
3.1. Basic Assumptions
3.2. Argument Types
3.3. Adjuncts

  • Lab on induction of Semantic types (from corpora using corpus analytics)
    and identification of types of arguments
  • Lab on meaning variation due to semantic type shifting

4. Event Structure
4.1. Basic Assumptions
4.2. Event Types
4.3 Event Composition

  • Lab on identification of event types
  • Lab on event type shifting

5. Meaning composition
5.1. Basic Assumptions
5.2. Co-composition
5.2. Coercion
5.3. Subselection

  • Lab or assignment on coercion
  • Lab on co-composition


James Pustejovsky holds the TJX Feldberg Chair in Computer Science at Brandeis University, where he directs the Lab for Linguistics and Computation, and chairs both the Program in Language and Linguistics and the Computational Linguistics Graduate Program. He has conducted research in computational linguistics, AI, lexical semantics, temporal reasoning, and corpus linguistics and language annotation. He has written several books on computational semantics, computational linguistics, and corpus processing. He has authored numerous books, including Generative Lexicon, MIT, 1995;  Semantics and the Lexicon, Springer, 1993; The Problem of Polysemy, CUP, 1996 (with B. Boguraev); The Language of Time, OUP, 2005 (with I. Mani and R. Gaizauskas),  Interpreting Motion: Grounded Representations for Spatial Language, OUP, 2012 (with I. Mani), and Natural Language Annotation for Machine Learning, O’Reilly, 2012 (with A. Stubbs). Recent books include: Recent Advances in Generative Lexicon Theory, Springer, 2013; A Guide to Generative Lexicon Theory, OUP, Forthcoming (with Elisabetta Jezek).

Elisabetta Jezek is an Associate Professor at the University of Pavia, where she has taught Syntax and Semantics and Applied Linguistics since 2001. Her research interests and areas of expertise include lexical semantics, verb classification, theory of argument structure, event structure in syntax and semantics, lexicon/ontology interplay, word class systems, and computational lexicography. She has edited a number of major works in lexicography and published contributions focusing on the interplay between corpus analysis, research methodology, and linguistic theory. Her publications include: Classi di Verbi tra Semantica e Sintassi, ETS, 2003;  Lessico: Classi di Parole, Strutture, Combinazioni, Il Mulino, 2005 (2nd ed. 2011);  The Lexicon: An Introduction, OUP, 2016; and  A Guide to Generative Lexicon Theory, OUP, Forthcoming (with James Pustejovsky).


Asher, Nicholas. 2011. A Web of Words: Lexical Meaning in Context, Cambridge University Press, Cambridge.

Hanks, Patrick and James Pustejovsky. 2005. “A Pattern Dictionary for Natural Language Processing”. Revue Franaise de linguistique applique, 10.2: 63-82.

Hanks, Patrick. 2013. Lexical Analysis: Norms and Exploitations. Cambridge Mass. The MIT Press.

Im, Seohyun. 2013. ”The generator of the event structure lexicon (GESL): automatic annotation of event structure for textual inference tasks”. Ph.D. Dissertation, Brandeis University.

Jezek, Elisabetta and James Pustejovsky. 2015. “Dynamic Argument Structure”, Universita di Pavia and Brandeis University, manuscript.

Jezek, Elisabetta and Valeria Quochi. 2010. “Capturing Coercions in Texts: a First Annotation Exercise”. In Calzolari N. et al. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Valletta, Malta, May 19-21, 2010, 1464-1471, ELRA.

Jezek, Elisabetta and Laure Vieu 2014. “Distributional analysis of copredication: Towards distinguishing systematic polysemy from coercion”. In Basili R., Lenci A., Magnini B. (eds.) First Italian Conference on Computational Linguistics CLiC-it 2014 (Dec. 9-10, 2014). Pisa: Pisa University Press, 219-223.

Pustejovsky, James. 1995. The Generative Lexicon, MIT Press, Cambridge, MA.

Pustejovsky, James 2013. “Dymanic Event Structure and Habitat Theory”, Proceedings of GL2013, 1-20.

Pustejovsky, James and Elisabetta Jezek. 2008. “Semantic Coercion in Language: Beyond Distributional Analysis”, Italian Journal of Linguistics, 20:1, 181-214.

Pustejovsky, James and Anna Rumshisky. 2008. “Between chaos and structure: Interpreting lexical data through a theoretical lens”. In International Journal of Lexicography 21:3, 337-355.