► Topic: Information-Theoretic Probing with Minimum Description Length ► Speaker: Elena Voita, a Ph.D. student at the University of Edinburgh and the University of Amsterdam ► Who will be interested: We will discuss enhanced ways of measuring what the BERT and ELMo models can learn. The talk will be of particular interest to people trying to use pre-trained models in practice, as well as to people with a more general interest in model analysis and/or representation learning. Attendees will get the most out of this talk if they have an awareness of these popular pre-trained models and understand their usefulness for downstream tasks. ► Language: English
► Registration: forms.gle/uhmCJ8cys2K2ZVyT9 The event is free with mandatory registration. Please look out for a confirmation email with all the necessary information and links to online streaming.
“How can you know whether a model (e.g., ELMo, BERT) has learned to encode a linguistic property? The most popular approach to measure how well pretrained representations encode a linguistic property is to use the accuracy of a probing classifier (probe). However, such probes often fail to adequately reflect differences in representations, and they can show different results depending on probe hyperparameters. As an alternative to standard probing, we propose information-theoretic probing which measures minimum description length (MDL) of labels given representations. In addition to probe quality, the description length evaluates ‘the amount of effort’ needed to achieve this quality. We show that (i) MDL can be easily evaluated on top of standard probe-training pipelines, and (ii) compared to standard probes, the results of MDL probing are more informative, stable, and sensible.”
► Topic: Information-Theoretic Probing with Minimum Description Length► Speaker: Elena Voita, a Ph.D. student at the University of Edinburgh and the University of Amsterdam► Who will be interested: We will discuss enhanced ways of measuring what the BERT and ELMo models can learn. The talk will be of particular interest to people trying to use pre-trained models in practice, as well as to people with a more general interest in model analysis and/or representation learning. Attendees will get the most out of this talk if they have an awareness of these popular pre-trained models and understand their usefulness for downstream tasks.► Language: English
► Registration: forms.gle/uhmCJ8cys2K2ZVyT9The event is free with mandatory registration. Please look out for a confirmation email with all the necessary information and links to online streaming.
“How can you know whether a model (e.g., ELMo, BERT) has learned to encode a linguistic property? The most popular approach to measure how well pretrained representations encode a linguistic property is to use the accuracy of a probing classifier (probe). However, such probes often fail to adequately reflect differences in representations, and they can show different results depending on probe hyperparameters. As an alternative to standard probing, we propose information-theoretic probing which measures minimum description length (MDL) of labels given representations. In addition to probe quality, the description length evaluates ‘the amount of effort’ needed to achieve this quality. We show that (i) MDL can be easily evaluated on top of standard probe-training pipelines, and (ii) compared to standard probes, the results of MDL probing are more informative, stable, and sensible.”