Machine-readable dictionary


Machine-readable dictionary is a dictionary stored as machine data instead of being printed on paper. It is an electronic dictionary and lexical database.
A machine-readable dictionary is a dictionary in an electronic form that can be loaded in a database and can be queried via application software. It may be a single language explanatory dictionary or a multi-language dictionary to support translations between two or more languages or a combination of both. Translation software between multiple languages usually apply bidirectional dictionaries. An MRD may be a dictionary with a proprietary structure that is queried by dedicated software or it can be a dictionary that has an open structure and is available for loading in computer databases and thus can be used via various software applications. Conventional dictionaries contain a lemma with various descriptions. A machine-readable dictionary may have additional capabilities and is therefore sometimes called a smart dictionary. An example of a smart dictionary is the Open Source Gellish English dictionary.

The term dictionary is also used to refer to an electronic vocabulary or lexicon as used for example in spelling checkers. If dictionaries are arranged in a subtype-supertype hierarchy of concepts then it is called a taxonomy. If it also contains other relations between the concepts, then it is called an ontology. Search engines may use either a vocabulary, a taxonomy or an ontology to optimise the search results. Specialised electronic dictionaries are morphological dictionaries or syntactic dictionaries.

The term MRD is often contrasted with NLP dictionary, in the sense that an MRD is the electronic form of a dictionary which was printed before on paper. Although being both used by programs, in contrast, the term NLP dictionary is preferred when the dictionary was built from scratch with NLP in mind. An ISO standard for MRD and NLP is able to represent both structures and is called Lexical Markup Framework.

History

The first widely distributed MRDs were the Merriam-Webster Seventh Collegiate and the Merriam-Webster New Pocket Dictionary. Both were produced by a government-funded project at System Development Corporation under the direction of John Olney. They were manually keyboarded as no typesetting tapes of either book were available. Originally each was distributed on multiple reels of magnetic tape as card images with each separate word of each definition on a separate punch card with numerous special codes indicating the details of its usage in the printed dictionary. Olney outlined a grand plan for the analysis of the definitions in the dictionary, but his project expired before the analysis could be carried out. Robert Amsler at the University of Texas at Austin resumed the analysis and completed a taxonomic description of the Pocket Dictionary under National Science Foundation funding, however his project expired before the taxonomic data could be distributed. Roy Byrd et al. at IBM Yorktown Heights resumed analysis of the Webster's Seventh Collegiate following Amsler's work. Finally, in the 1980s starting with initial support from Bellcore and later funded by various U.S. federal agencies, including NSF, ARDA, DARPA, DTO, and REFLEX, George Armitage Miller and Christiane Fellbaum at Princeton University completed the creation and wide distribution of a dictionary and its taxonomy in the WordNet project, which today stands as the most widely distributed computational lexicology resource.