Browsing by Author "Baraniuk, Richard G."

Now showing items 1-3 of 3

Improving transformers with probabilistic attention keys

Le, Duy Dung; Tran, Viet Anh; Nguyen, M. Tan; Nguyen, Tam; Nguyen, Duy Khuong; Baraniuk, Richard G.; Ho, Nhat; Osher, Stanley J. (2022)

Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that ...
Improving transformers with probabilistic attention keys

Nguyen, Tam; Nguyen, M. Tan; Le, D. Dung; Nguyen, Khuong Duy; Tran, Viet Anh; Baraniuk, Richard G.; Osher, Stanley J.; Ho, Nhat (2022-06-13)

Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that ...
A probabilistic framework for pruning transformers via a finite admixture of keys

Nguyen, M. Tan; Nguyen, Tam; Bui, Long; Do, Hai; Nguyen, Duy Khuong; Le, Duy Dung; Tran, The Hung; Ho, Nhat; Osher, Stan J.; Baraniuk, Richard G. (2023-04-11)

Pairwise dot product-based self-attention is key to the success of transformers which achieve state-of-the-art performance across a variety of applications in language and vision, but are costly to compute. It has been ...