Lexical substitution


Lexical substitution is the task of identifying a substitute for a word in the context of a clause. For instance, given the following text: "After the match, replace any remaining fluid deficit to prevent chronic dehydration throughout the tournament", a substitute of game might be given.
Lexical substitution is strictly related to word sense disambiguation, in that both aim to determine the meaning of a word. However, while WSD consists of automatically assigning the appropriate sense from a fixed sense inventory, lexical substitution does not impose any constraint on which substitute to choose as the best representative for the word in context. By not prescribing the inventory, lexical substitution overcomes the issue of the granularity of sense distinctions and provides a level playing field for automatic systems that automatically acquire word senses.

Evaluation

In order to evaluate automatic systems on lexical substitution, a task was organized at the evaluation competition held in Prague in 2007. A task on cross-lingual lexical substitution has also taken place.

Skip-gram model

The skip-gram model takes words with similar meanings into a vector space that are found close to each other in N-dimensions. A variety of neural networks are formed together as a result of the vectors and networks that are related together. This all occurs in the dimensions of the vocabulary that has been generated in a network.
The model has been used in lexical substitution automation and prediction algorithms. One such algorithm developed by Oren Melamud, Omer Levy, and Ido Dagan uses the skip-gram model to find a vector for each word and its synonyms. Then, it calculates the cosine distance between vectors to determine which words will be the best substitutes.

Example

In a sentence like "The dog walked at a quick pace" each word has a specific vector in relation to the other. The vector for "The" would be because the 1 is the word vocabulary and the 0s are the words surrounding that vocabulary, which create a vector.