##article.return##
First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training
Download
Download PDF