WordNet


WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. WordNet can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. WordNet was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website.

History and team members

WordNet was first created in English only in the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George Armitage Miller starting in 1985 and has been directed in recent years by Christiane Fellbaum. The project was initially funded by the U.S. Office of Naval Research and later also by other U.S. government agencies including the DARPA, the National Science Foundation, the Disruptive Technology Office, and REFLEX. George Miller and Christiane Fellbaum were awarded the 2006 Antonio Zampolli Prize for their work with WordNet.
The Global WordNet Association is a non-commercial organization that provides a platform for discussing, sharing and connecting WordNets for all languages in the world, and has Christiane Fellbaum and Piek Th.J.M. Vossen and as co-presidents.

Database contents

The database contains 155 327 words organized in 175 979 synsets for a total of 207 016 word-sense pairs; in compressed form, it is about 12 megabytes in size.
WordNet includes the lexical categories nouns, verbs, adjectives and adverbs but ignores prepositions, determiners and other function words.
Words from the same lexical category that are roughly synonymous are grouped into synsets. Synsets include simplex words as well as collocations like "eat out" and "car pool." The different senses of a polysemous word form are assigned to different synsets. The meaning of a synset is further clarified with a short defining gloss and one or more usage examples. An example adjective synset is:
All synsets are connected to other synsets by means of semantic relations. These relations, which are not all shared by all lexical categories, include:
These semantic relations hold among all members of the linked synsets. Individual synset members can also be connected with lexical relations. For example, the noun "director" is linked to the verb "direct" from which it is derived via a "morphosemantic" link.
The morphology functions of the software distributed with the database try to deduce the lemma or stem form of a word from the user's input. Irregular forms are stored in a list, and looking up "ate" will return "eat," for example.

Knowledge structure

Both nouns and verbs are organized into hierarchies, defined by hypernym or IS A relationships. For instance, one sense of the word dog is found following hypernym hierarchy; the words at the same level represent synset members. Each set of synonyms has a unique index.
At the top level, these hierarchies are organized into 25 beginner "trees" for nouns and 15 for verbs. All are linked to a unique beginner synset, "entity".
Noun hierarchies are far deeper than verb hierarchies
Adjectives are not organized into hierarchical trees. Instead, two "central" antonyms such as "hot" and "cold" form binary poles, while 'satellite' synonyms such as "steaming" and "chilly" connect to their respective poles via a "similarity" relations. The adjectives can be visualized in this way as "dumbbells" rather than as "trees".

Psycholinguistic aspects

The initial goal of the WordNet project was to build a lexical database that would be consistent with theories of human semantic memory developed in the late 1960s. Psychological experiments indicated that speakers organized their knowledge of concepts in an economic, hierarchical fashion. Retrieval time required to access conceptual knowledge seemed to be directly related to the number of hierarchies the speaker needed to "traverse" to access the knowledge. Thus, speakers could more quickly verify that canaries can sing because a canary is a songbird, but required slightly more time to verify that canaries can fly and even more time to verify canaries have skin.
While such psycholinguistic experiments and the underlying theories have been subject to criticism, some of WordNet's organization is consistent with experimental evidence. For example, anomic aphasia selectively affects speakers' ability to produce words from a specific semantic category, a WordNet hierarchy. Antonymous adjectives are found to co-occur far more frequently than chance, a fact that has been found to hold for many languages.

As a lexical ontology

WordNet is sometimes called an ontology, a persistent claim that its creators do not make. The hypernym/hyponym relationships among the noun synsets can be interpreted as specialization relations among conceptual categories. In other words, WordNet can be interpreted and used as a lexical ontology in the computer science sense. However, such an ontology should be corrected before being used, because it contains hundreds of basic semantic inconsistencies; for example there are, common specializations for exclusive categories and redundancies in the specialization hierarchy. Furthermore, transforming WordNet into a lexical ontology usable for knowledge representation should normally also involve distinguishing the specialization relations into subtypeOf and instanceOf relations, and associating intuitive unique identifiers to each category. Although such corrections and transformations have been performed and documented as part of the integration of WordNet 1.7 into the cooperatively updatable knowledge base of WebKB-2, most projects claiming to re-use WordNet for knowledge-based applications simply re-use it directly.
WordNet has also been converted to a formal specification, by means of a hybrid bottom-up top-down methodology to automatically extract association relations from WordNet, and interpret these associations in terms of a set of conceptual relations, formally defined in the DOLCE foundational ontology.
In most works that claim to have integrated WordNet into ontologies, the content of WordNet has not simply been corrected when it seemed necessary; instead, WordNet has been heavily re-interpreted and updated whenever suitable. This was the case when, for example, the top-level ontology of WordNet was re-structured according to the OntoClean based approach or when WordNet was used as a primary source for constructing the lower classes of the SENSUS ontology.

Limitations

The most widely discussed limitation of WordNet is that some of the semantic relations are more suited to concrete concepts than to abstract concepts.. For example, it is easy to create hyponyms/hypernym relationships to capture that a "conifer" is a type of "tree", a "tree" is a type of "plant", and a "plant" is a type of "organism", but it is difficult to classify emotions like "fear" or "happiness" into equally deep and well-defined hyponyms/hypernym relationships.
Many of the concepts in WordNet are specific to certain languages and the most accurate reported mapping between languages is 94%. Synonyms, hyponyms, meronyms, and antonyms occur in all languages with a WordNet so far, but other semantic relationships are language-specific. This limits the interoperability across languages. However, it also makes WordNet a resource for highlighting and studying the differences between languages, so it is not necessarily a limitation for all use cases.
WordNet does not include information about the etymology or the pronunciation of words and it contains only limited information about usage. WordNet aims to cover most everyday words and does not include much domain-specific terminology.
WordNet is the most commonly used computational lexicon of English for word sense disambiguation, a task aimed to assigning the context-appropriate meanings to words in a text. However, it has been argued that WordNet encodes sense distinctions that are too fine-grained. This issue prevents WSD systems from achieving a level of performance comparable to that of humans, who do not always agree when confronted with the task of selecting a sense from a dictionary that matches a word in a context. The granularity issue has been tackled by proposing clustering methods that automatically group together similar senses of the same word.

Offensive Content

WordNet includes words that can be perceived as pejorative or offensive. The interpretation of a word can change over time and between social groups, so it is not always possible for WordNet to define a word as "pejorative" or "offensive" in isolation. Therefore, people using WordNet must apply their own methods to identify offensive or pejorative words.
However, this limitation is true of other lexical resources like dictionaries and thesauruses, which also contain pejorative and offensive words. Some dictionaries indicate words that are pejoratives, but do not include all the contexts in which words might be acceptable or offensive to different social groups. Therefore, people using dictionaries must apply their own methods to identify all offensive words.

Licensed vs. Open WordNets

Some wordnets were subsequently created for other languages. A 2012 survey lists the wordnets and their availability. In an effort to propagate the usage of WordNets, the Global WordNet community had been slowly re-licensing their WordNets to an open domain where researchers and developers can easily access and use WordNets as language resources to provide ontological and lexical knowledge in Natural Language Processing tasks.
The Open Multilingual WordNet provides access to open licensed wordnets in a variety of languages, all linked to the Princeton Wordnet of English. The goal is to make it easy to use wordnets in multiple languages.

Applications

WordNet has been used for a number of purposes in information systems, including word-sense disambiguation, information retrieval, automatic text classification, automatic text summarization, machine translation and even automatic crossword puzzle generation.
A common use of WordNet is to determine the similarity between words. Various algorithms have been proposed, including measuring the distance among words and synsets in WordNet's graph structure, such as by counting the number of edges among synsets. The intuition is that the closer two words or synsets are, the closer their meaning. A number of WordNet-based word similarity algorithms are implemented in a Perl package called WordNet::Similarity, and in a Python package called NLTK. Other more sophisticated WordNet-based similarity techniques include ADW, whose implementation is available in Java. WordNet can also be used to inter-link other vocabularies.

Interfaces

Princeton maintains a list of related projects that includes links to some of the widely used application programming interfaces available for accessing WordNet using various programming languages and environments.

Related projects and extensions

WordNet is connected to several databases of the Semantic Web. WordNet is also commonly re-used via mappings between the WordNet synsets and the categories from ontologies. Most often, only the top-level categories of WordNet are mapped.

Global WordNet Association

The Global WordNet Association is a public and non-commercial organization that provides a platform for discussing, sharing and connecting wordnets for all languages in the world. The GWA also promotes the standardization of wordnets across languages, to ensure its uniformity in enumerating the synsets in human languages. The GWA keeps a list of wordnets developed around the world.

Other languages

Projects such as BalkaNet and EuroWordNet made it feasible to create standalone wordnets linked to the original one. One of such projects was Russian WordNet patronized by Petersburg State University of Means of Communication led by S.A. Yablonsky or Russnet by Saint Petersburg State University
WordNet Database is distributed as a dictionary package for the following software: