Distributional semantics

Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called Distributional hypothesis: linguistic items with similar distributions have similar meanings.

Distributional hypothesis

The distributional hypothesis in linguistics is derived from the semantic theory of language usage, i.e. words that are used and occur in the same contexts tend to purport similar meanings.
The underlying idea that "a word is characterized by the company it keeps" was popularized by Firth in the 1950s.
The distributional hypothesis is the basis for statistical semantics. Although the Distributional Hypothesis originated in linguistics, it is now receiving attention in cognitive science especially regarding the context of word use.
In recent years, the distributional hypothesis has provided the basis for the theory of similarity-based generalization in language learning: the idea that children can figure out how to use words they've rarely encountered before by generalizing about their use from distributions of similar words.
The distributional hypothesis suggests that the more semantically similar two words are, the more distributionally similar they will be in turn, and thus the more that they will tend to occur in similar linguistic contexts.
Whether or not this suggestion holds has significant implications for both the data-sparsity problem in computational modeling, and for the question of how children are able to learn language so rapidly given relatively impoverished input.

Distributional semantic modeling in vector spaces

Distributional semantics favor the use of linear algebra as computational tool and representational framework. The basic approach is to collect distributional information in high-dimensional vectors, and to define distributional/semantic similarity in terms of vector similarity. Different kinds of similarities can be extracted depending on which type of distributional information is used to collect the vectors: topical similarities can be extracted by populating the vectors with information on which text regions the linguistic items occur in; paradigmatic similarities can be extracted by populating the vectors with information on which other linguistic items the items co-occur with. Note that the latter type of vectors can also be used to extract syntagmatic similarities by looking at the individual vector components.
The basic idea of a correlation between distributional and semantic similarity can be operationalized in many different ways. There is a rich variety of computational models implementing distributional semantics, including latent semantic analysis, Hyperspace Analogue to Language, syntax- or dependency-based models, random indexing, semantic folding and various variants of the topic model.
Distributional semantic models differ primarily with respect to the following parameters:

Distributional semantic models that use linguistic items as context have also been referred to as word space, or vector space models.

Beyond Lexical Semantics

While distributional semantics typically has been applied to lexical items -- words and multi-word terms -- with considerable success, not least due to its applicability as an input layer for neurally inspired deep learning models, lexical semantics, i.e. the meaning of words, will only carry part of the semantics of an entire utterance. The meaning of a clause, e.g. "Tigers love rabbits.", can only partially be understood from examining the meaning of the three lexical items it consists of. Distributional semantics can straightforwardly be extended to cover larger linguistic item such as constructions, with and without non-instantiated items, but some of the base assumptions of the model need to be adjusted somewhat. Construction grammar and its formulation of the lexical-syntactic continuum offers one approach for including more elaborate constructions in a distributional semantic model and some experiments have been implemented using the Random Indexing approach.
Compositional distributional semantic models extend distributional semantic models by explicit semantic functions that use syntactically based rules to combine the semantics of participating lexical units into a compositional model to characterize the semantics of entire phrases or sentences. Different approaches to composition have been explored -- including neural models -- and are under discussion at established workshops such as SemEval.

Applications

Distributional semantic models have been applied successfully to the following tasks:

finding semantic similarity between words and multi-word expressions;
word clustering based on semantic similarity;
automatic creation of thesauri and bilingual dictionaries;
word sense disambiguation;
expanding search requests using synonyms and associations;
defining the topic of a document;
document clustering for information retrieval;
data mining and named entities recognition;
creating semantic maps of different subject domains;
paraphrasing;
sentiment analysis;
modeling selectional preferences of words.
Software
People
Scott Deerwester
Susan Dumais
J. R. Firth
George Furnas
Zellig Harris
Richard Hirschman
Thomas Landauer
Magnus Sahlgren
Hinrich Schütze

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Distributional semantics

Distributional hypothesis

Distributional semantic modeling in vector spaces

Beyond Lexical Semantics

Applications

Software

People