Lesk algorithm
The Lesk algorithm is a classical algorithm for word sense disambiguation introduced by Michael E. Lesk in 1986.
Overview
The Lesk algorithm is based on the assumption that words in a given "neighborhood" will tend to share a common topic. A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained in its neighborhood. Versions have been adapted to use WordNet. An implementation might look like this:- for every sense of the word being disambiguated one should count the amount of words that are in both neighborhood of that word and in the dictionary definition of that sense
- the sense that is to be chosen is the sense which has the biggest number of this count
PINE
1. kinds of evergreen tree with needle-shaped leaves
2. waste away through sorrow or illness
CONE
1. solid body which narrows to a point
2. something of this shape whether solid or hollow
3. fruit of certain evergreen trees
As can be seen, the best intersection is Pine #1 ⋂ Cone #3 = 2.
Simplified Lesk algorithm
In Simplified Lesk algorithm, the correct meaning of each word in a given context is determined individually by locating the sense that overlaps the most between its dictionary definition and the given context. Rather than simultaneously determining the meanings of all words in a given context, this approach tackles each word individually, independent of the meaning of the other words occurring in the same context."A comparative evaluation performed by Vasilescu et al. has shown that the simplified Lesk algorithm can significantly outperform the original definition of the algorithm, both in terms of precision and efficiency. By evaluating the disambiguation algorithms on the Senseval-2 English all words data, they measure a 58% precision using the simplified Lesk algorithm compared to the only 42% under the original algorithm.
Note: Vasilescu et al. implementation considers a back-off strategy for words not covered by the algorithm, consisting of the most frequent sense defined in WordNet. This means that words for which all their possible meanings lead to zero overlap with current context or with other word definitions are by default assigned sense number one in WordNet."
Simplified LESK Algorithm with smart default word sense
function SIMPLIFIED LESK returns best sense of word end return |
The COMPUTEOVERLAP function returns the number of words in common between two sets, ignoring function words or other words on a stop list. The original Lesk algorithm defines the context in a more complex way.
Criticisms and other Lesk-based methods
Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions, so the absence of a certain word can radically change the results. Further, the algorithm determines overlaps only among the glosses of the senses being considered. This is a significant limitation in that dictionary glosses tend to be fairly short and do not provide sufficient vocabulary to relate fine-grained sense distinctions.A lot of work has appeared offering different modifications of this algorithm. These works use other resources for analysis : for instance, it may use such information as synonyms, different derivatives, or words from definitions of words from definitions.
There are a lot of studies concerning Lesk and its extensions:
- Kwong, 2001;
- Nastase and Szpakowicz, 2001;
- Wilks and Stevenson, 1998, 1999;
- Mahesh et al., 1997;
- Cowie et al., 1992;
- Yarowsky, 1992;
- Pook and Catlett, 1988;
- Kilgarriff and Rosensweig, 2000;
- Gelbukh and Sidorov, 2004.
Lesk variants
- Original Lesk
- Adapted/Extended Lesk : In the adaptive lesk algorithm, a word vector is created corresponds to every content word in the wordnet gloss. Concatenating glosses of related concepts in WordNet can be used to augment this vector. The vector contains the co-occurrence counts of words co-occurring with w in a large corpus. Adding all the word vectors for all the content words in its gloss creates the Gloss vector g for a concept. Relatedness is determined by comparing the gloss vector using the Cosine similarity measure.