BEGIN:VCALENDAR
VERSION:2.0
CALSCALE:GREGORIAN
PRODID:UW-Madison-Physics-Events
BEGIN:VEVENT
SEQUENCE:0
UID:UW-Physics-Event-6402
DTSTART:20210407T160000Z
DTEND:20210407T171500Z
DTSTAMP:20260314T084504Z
LAST-MODIFIED:20210322T041911Z
LOCATION:Online Seminar: Please sign up for our mailing list at www.ph
 ysicsmeetsml.org for zoom link
SUMMARY:A Mathematical Exploration of Why Language Models Help Solve D
 ownstream Tasks\, Physics ∩ ML Seminar\, Nikunj Saunshi\, Princeton 
 University
DESCRIPTION:Autoregressive language models\, pretrained on large text 
 corpora to do well on next word prediction\, have been successful at s
 olving many downstream tasks\, even with zero-shot usage. However\, th
 ere is little theoretical understanding of this success. We initiate a
  mathematical study of this phenomenon for the downstream task of text
  classification by considering the following questions: (1) What is th
 e intuitive connection between the pretraining task of next word predi
 ction and text classification? (2) How can we mathematically formalize
  this connection and quantify the benefit of language modeling? For (1
 )\, we hypothesize\, and verify empirically\, that classification task
 s of interest can be reformulated as sentence completion tasks\, thus 
 making language modeling a meaningful pretraining task. With a mathema
 tical formalization of this hypothesis\, we make progress towards (2) 
 and show that language models that are \\epsilon-optimal in cross-entr
 opy (log-perplexity) learn features that can linearly solve such class
 ification tasks with O(\\sqrt{\\epsilon})-error\, thus demonstrating t
 hat doing well on language modeling can be beneficial for downstream t
 asks. We experimentally verify various assumptions and theoretical fin
 dings\, and also use insights from the analysis to design a new object
 ive function that performs well on some classification tasks.
URL:https://www.physics.wisc.edu/events/?id=6402
END:VEVENT
END:VCALENDAR
