Taxonomy for search engines


Taxonomy for search engines refers to classification methods that improve relevance in vertical search. Taxonomies of entities are tree structures whose nodes are labelled with entities likely to occur in a web search query. Searches use these trees to match keywords from a search query to keywords from answers.
Taxonomies, thesauri and concept hierarchies are crucial components for many applications of information retrieval, natural language processing and knowledge management. Building, tuning and managing taxonomies and ontologies are costly since a lot of manual operations are required. A number of studies proposed the automated building of taxonomies based on linguistic resources and/or statistical machine learning. A number of tools using SKOS standard are also available to streamline work with taxonomies.
Web mining is one approach to build a search engine taxonomy. The taxonomy construction process starts from seed entities, and mines available source domains for new entities associated with these seed entities. The process forms new entities by applying machine learning to current web search results for existing entities to identify commonalities between them. These commonality expressions then form parameters of existing entities, and turn into new entities at the next learning iteration.