Please consider using Firefox or Chrome. In IE, history navigation won't work.

The ongoing "Corpus Pattern Analysis" project is building a "Pattern Dictionary of English Verbs" (PDEV). PDEV is an inventory of the normal patterns of use of English verbs, with links in each case to an unambiguous meaning.

The problem that PDEV aims to solve is the highly ambiguous nature of the link between words and meanings. First, it is necessary to recognize that in natural language words don't have meaning -- they have only "meaning potential" -- something that is so vague, variable, and ambiguous that it cannot be used as a basis for reliable disambiguation, computing, or anything else. What we see in standard English dictionaries are lists of undifferentiated meaning potentials. The human user has to guess at the differentiating criteria. Humans are good at guessing such criteria, so they can use ordinary dictionaries. Computers aren't, so they can't.

The problem is soluble, however, if we take a closer look at how words are actually used to make meanings. We see that meanings in text depend (probabilistically, statistically) on patterns of interaction between nouns and verbs. The aim, therefore, is to establish an inventory of patterns (PDEV). It turns out that patterns are almost always unambiguous.

The next step is to develop computational techniques for relating unseen uses of a verb in text to the relevant pattern.

Corpus Pattern Analysis (CPA) is a new technique for mapping meaning onto words in text. It is currently being used to build a 'Pattern Dictionary of English Verbs', which will be a fundamental resource for use in computational linguistics, language teaching, and cognitive science. It is based on the Theory of Norms and Exploitations (TNE, see Hanks 2004 and 2013, Hanks and Pustejovsky 2005). TNE in turn is a theory that owes much to the work of Pustejovsky on the Generative Lexicon (see Pustejovsky 1995), to Wilks's theory of preference semantics (e.g. Wilks 1975), to Sinclair's work on corpus analysis and collocations (eg. Sinclair 1966, 1987, 1991, 2004), to the Cobuild project in lexical computing (Sinclair et al. 1987), and to the Hector project (Atkins 1993; Hanks 1994). CPA is also influenced by frame semantics (Fillmore and Atkins, 1992). It is complementary to FrameNet. Where FrameNet offers an in-depth analysis of semantic frames, CPA offers a systematic analysis of the patterns of meaning and use of each verb. Each CPA pattern can in principle be plugged into a FN semantic frame. Some recent work in American linguistics (Jackendoff 2002) has complained about the excessive "syntactocentrism" of American linguistics in the 20th century. TNE offers a lexicocentric approach, with opportunities for synthesis, which will go some way towards redressing the balance.

The focus of the analysis is on the prototypical syntagmatic patterns with which words in use are associated. Patterns for verbs and patterns for nouns are different in kind. Noun patterns will consist of corpus-driven gnomic statements, into which all relevant collocates will be incorporated, to give a full picture of the noun's idiomatic use. Verb patterns consist not only of the basic "argument structure" or "valency structure" of each verb (typically with semantic values stated for each of the elements), but also of subvalency features, where relevant, such as the presence or absence of a determiner in noun phrases constituting a direct object. For example, the meaning of take place is quite different from the meaning of take his place. The possessive determiner makes all the difference to the meaning.

No attempt is made in CPA to identify the meaning of a verb or noun directly, as a word in isolation. Instead, meanings are associated with prototypical sentence contexts. Concordance lines are grouped into semantically motivated syntagmatic patterns. Associating a "meaning" with each pattern is a secondary step, carried out in close coordination with the assignment of concordance lines to patterns. The identification of a syntagmatic pattern is not an automatic procedure: it calls for a great deal of lexicographic art. Among the most difficult of all lexicographic decisions is the selection of an appropriate level of generalization on the basis of which senses are to be distinguished. For example, one might say that the intransitive verb abate has only one sense ("become less in intensity"), or one might separate storm abate from political protest abate, on the grounds that the two contexts have different implicatures. That is a simple example, but in more complex cases (e.g. the verb bear) patterns are indispensible for effective disambiguation. Bearing a heavy burden is a pattern that normally has an abstract interpretation in English (as opposed to, say, carrying a heavy load), and the meaning is associated with the prototypical phrase, which is quite different in turn from I can't bear it.

In CPA, the "meaning" of a pattern is expressed as a set of basic implicatures. E.g., for the verb file one pattern is: [[Human = Plaintiff]] file [[Procedure = Lawsuit]] of which the implicature may be expressed as "If you file a law suit, you are acting as the plaintiff and you activate a procedure by which you hope to obtain redress for some wrong that you believe has been done to you"). Depending on the proposed application, the implicature of a pattern may be expressed in any of a wide variety of other ways, e.g. as a translation into another language or as a synonym set such as "file = activate, start, begin, lodge". Each argument of each pattern is linked to a node in a shallow semantic ontology (Pustejovsky et al. 2004).

Every verb pattern is supported by links to analysed evidence of usage in a corpus (at present, the British National Corpus). Corpus samples are randomly selected and typically consist of 250 corpus lines (or a multiple thereof) for each verb. Every line in the sample is classified. This enables CPA to complement the Sketch Engine’s focus on statistically significant collocations.

Most corpus lines are ‘normal uses’, assigned to a specific pattern. However, a few corpus lines are classified as a creative ‘exploitation’ of a normal use and tagged as such. There are three subclasses of exploitations: semantically anomalous arguments, figurative uses, and syntactically anomalous structures. In this way, CPA is building up an unrivalled collection of examples of creative language use, as a spin-off from its central task of identifying normal usage and meanings.

Among other features, PDEV also provides information about the relative frequency of each phraseological pattern.

PDEV:  This is a first draft which has not yet been checked. This verb has not been released yet.
Access full data
samplesize:
patterns:
Displayed here are only.
Other options: Major patterns | Minor patterns | Phrasal verbs | Idioms | All patterns
Loading...
Verb Patterns Status BNC50 BNC OEC
Loading...
Pattern:
Implicature:
+
Example:
Loading...
Subject
Subject alternations
Verb
Indirect Object
Object
Object alternations
Adverbials
Primary implicature
Secondary implicature
Framenet

CPA & PDEV bibliography

Ontology

Close
Noun collocates in: Subj Obj Advl all
Verb Pattern no.

Semantic types

Loading...

Close
Noun collocates in: Subj Obj Advl all
Verb Pattern no.

Data for download