Browsing by Author "Osher, Stan J."
Now showing items 1-1 of 1
-
A probabilistic framework for pruning transformers via a finite admixture of keys
Nguyen, M. Tan; Nguyen, Tam; Bui, Long; Do, Hai; Nguyen, Duy Khuong; Le, Duy Dung; Tran, The Hung; Ho, Nhat; Osher, Stan J.; Baraniuk, Richard G. (2023-04-11)Pairwise dot product-based self-attention is key to the success of transformers which achieve state-of-the-art performance across a variety of applications in language and vision, but are costly to compute. It has been ...