MeCab


MeCab is an open-source text segmentation library for use with text written in the Japanese language originally developed by the Nara Institute of Science and Technology and currently maintained by Taku Kudou as part of his work on the Google Japanese Input project. The name derives from the developer's favorite food, , a Japanese dish made from wakame leaves.
The software was originally based on ChaSen and was developed under the name ChaSenTNG, but now it is developed independently from ChaSen and was rewritten from scratch. MeCab's analysis accuracy is comparable to ChaSen, and its analysis speed is 3–4 times faster on average.
MeCab can analyze and segment a sentence into its parts of speech. There are several dictionaries available for MeCab, but IPADIC is the most commonly used one as with ChaSen.
In 2007, Google used MeCab to generate n-gram data for a large corpus of Japanese text, which it published on its Google Japan blog.
MeCab is also used for Japanese input on Mac OS X 10.5 and 10.6, and in iOS since version 2.1.

Example

Input:
italic=no
Results in:
italic=no
Besides segmenting the text, MeCab also lists the part of speech of the word, and, if applicable and in the dictionary, its pronunciation. In the above example, the verb できる is classified as an ichidan verb in the infinitive tense. The word でも is identified as an adverbial particle. As not all columns apply to all words, when a column does not apply to a word, an asterisk is used; this makes it possible to format the information after the word and the tab character as the comma-separated values.
MeCab also supports several output formats; one of which, chasen, outputs tab-separated values in a format that programs written for ChaSen can use. Another format, yomi, outputs the pronunciation of the input text as katakana, as shown below.
italic=no