Natural Language Processing for Collective Discourse
Natural Language Processing (NLP) has become very popular in recent years thanks to new technologies like IBM's Watson, Apple's Siri, Google Translate, and Yahoo's text summarization system. One of the fundamental challenges in NLP is to automatically recognize similar words and sentences. I will talk about research done in the Computational Linguistics And Information Retrieval lab (CLAIR) on graph-based methods for similarity recognition and its applications to NLP tasks. These projects are related to Collective Discourse (text collections produced by large numbers of users) and its inherent properties such as centrality and diversity. In the first project we team up with the New Yorker magazine. Each week a captionless cartoon is published in the magazine and thousands of readers try to come up with funny captions for it. In our work, we try to uncover the topics of the jokes in the submitted captions. The second project is about analysing a corpus of word clues used in New York Times crossword puzzles. We compare different clustering methods for word sense disambiguation using these crossword clues. The third project is about the automatic generation of citation-based summaries of research articles. These summaries describe what readers of the papers find most important in the cited papers. If time permits, I will also briefly mention some applications to political science, and social network analysis.
Dragomir Radev is a Professor of Information and Professor of Computer Science and Engineering at the University of Michigan. He is also affiliated with the Michigan Institute for Data Science (MIDAS) and the Department of Linguistics. Dragomir has a PhD in Computer Science from Columbia University. Dragomir's research is in Natural Language Processing, Applied Machine Learning, and Information Retrieval. He works in the fields of text summarization, lexical semantics, sentiment analysis, open domain question answering, and the application of NLP to other areas such as Bioinformatics and Political Science. Dragomir is the past secretary of ACL (Association for Computational Linguistics).
Dragomir is also co-founder of the North American Computational Linguistics Olympiad (NACLO) and the coach of the US team for the International Linguistics Olympiad. Dragomir has close to 200 international publications as well as three patents. Dragomir has worked for or consulted for IBM, Yahoo, Microsoft, AT&T, and other companies. He has been funded by a number of sources including NIH, IBM, NSF, DARPA, and IARPA. In 2013, Dragomir received the University of Michigan's Distinguished Faculty Award. He is an associate editor of JAIR. Dragomir became a Fellow of the Association for Computing Machinery (ACM) in 2015.