Babel program


The IARPA Babel program developed speech recognition technology for noisy telephone conversations. The main goal of the program was to improve the performance of keyword search on languages with very little transcribed data, i.e. low-resource languages. Data from 26 languages was collected with certain languages being held-out as "surprise" languages to test the ability of the teams to rapidly build a system for a new language.
Beginning in 2012, Two industry-led teams and two university-led teams participated.
Some of the funding from Babel was used to further develop the Kaldi tookit. The speech data was later made available through the Linguistic Data Consortium.