Computer Science Theory Seminar
Gabor Berend
University of Szeged
Determining sparse word representations in monolingual and multilingual settings
Abstract: Symbolic representations have been superseded by continuous representations in
practically all natural language processing (NLP) applications. Despite the
fact that the popular continuous representations are capable of
solving various NLP tasks close to human performance, the kind of
representations employed in most recent NLP frameworks do not really
resemble human cognition. In this talk, we will review algorithms for
obtaining continuous meaning representations of natural language, then
propose an approach to distill symbolic features from them in a way that convey human interpretable, commonsense knowledge as well. We additionally present our experimental results suggesting that the symbolic features distilled from continuous representations via sparse coding can be used for training standard statistical models that perform comparably to more expensive and less interpretable neural models. Finally, we also introduce an efficient algorithm for constructing multilingual sparse word representations, opening up the possibility for performing zero-shot learning across languages.
Tuesday February 18, 2020 at 3:00 PM in 1325 SEO