Browsing by Subject "admixture models"
Now showing items 1-1 of 1
-
A probabilistic framework for pruning transformers via a finite admixture of keys
(2023-04-11)Pairwise dot product-based self-attention is key to the success of transformers which achieve state-of-the-art performance across a variety of applications in language and vision, but are costly to compute. It has been ...