Browsing by Author "Baraniuk, Richard G."
Now showing items 1-3 of 3
-
Improving transformers with probabilistic attention keys
Le, Duy Dung; Tran, Viet Anh; Nguyen, M. Tan; Nguyen, Tam; Nguyen, Duy Khuong; Baraniuk, Richard G.; Ho, Nhat; Osher, Stanley J. (2022)Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that ... -
Improving transformers with probabilistic attention keys
Nguyen, Tam; Nguyen, M. Tan; Le, D. Dung; Nguyen, Khuong Duy; Tran, Viet Anh; Baraniuk, Richard G.; Osher, Stanley J.; Ho, Nhat (2022-06-13)Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that ... -
A probabilistic framework for pruning transformers via a finite admixture of keys
Nguyen, M. Tan; Nguyen, Tam; Bui, Long; Do, Hai; Nguyen, Duy Khuong; Le, Duy Dung; Tran, The Hung; Ho, Nhat; Osher, Stan J.; Baraniuk, Richard G. (2023-04-11)Pairwise dot product-based self-attention is key to the success of transformers which achieve state-of-the-art performance across a variety of applications in language and vision, but are costly to compute. It has been ...