Andrew McCallum


Andrew McCallum is a professor and researcher in the computer science department at University of Massachusetts Amherst. His primary specialties are in machine learning, natural language processing, information extraction, information integration, and social network analysis.

Career

McCallum graduated summa cum laude from Dartmouth College in 1989. He completed his Ph.D. at University of Rochester in 1995 under the supervision of Dana H. Ballard. He was then a postdoctoral fellow, working with Sebastian Thrun and Tom M. Mitchell at Carnegie Mellon University. From 1998 to 2000 he was a Research Scientist and Research Coordinator at Justsystem Pittsburgh Research Center. From 2000 to 2002 was Vice President of Research and Development at WhizBang Labs, and Director of its Pittsburgh office. Since 2012, he worked as a professor of computer science at the University of Massachusetts Amherst.
He was elected as a fellow of the Association for the Advancement of Artificial Intelligence in 2009, and as an Association for Computing Machinery in 2017. From 2014 to 2017 he was the President of International Machine Learning Society, which organizes the International Conference on Machine Learning. He is also the director of the Center for Data Science at UMass, leading a new partnership with the Chan and Zuckerberg Initiative. In 2018, the initiative made an initial grant of 5.5 million to the center, supporting research to facilitate new ways for scientists to explore and discover research articles.

Main contributions

In collaboration with John Lafferty and Fernando Pereira, McCallum developed conditional random fields, first described in a paper presented at the International Conference on Machine Learning. In 2011 this research paper won the ICML "Test of Time" award.
McCallum has written several widely used open-source software toolkits for machine learning, natural language processing and other text processing, including Rainbow, Mallet, and FACTORIE. In addition, he was instrumental in publishing the Enron Corpus, a large collection of emails that has been used as a basis for a number of academic studies of social networking and language.