LOGOVIT: Local-global vision transformer for object re-identification

Phan, Nguyen; Tran, Sam; Nguyen, Tran Hoang; Ta, Duc Huy; Duong, T. M. Soan; Nguyen, D. Tr. Chanh; Dao, Huu Hung; Bui, Trung; Truong, Q. H. Steven

Xem/Mở

ICASSP2023.pdf (537.6Kb)

Năm xuất bản

2023-06

Tác giả

Phan, Nguyen

Tran, Sam

Nguyen, Tran Hoang

Ta, Duc Huy

Duong, T. M. Soan

Nguyen, D. Tr. Chanh

Dao, Huu Hung

Bui, Trung

Truong, Q. H. Steven

Metadata

Hiển thị đầy đủ biểu ghi

Tóm tắt

Object re-identification (ReID) is prone to errors under variations in scale, illumination, complex background, and object occlusion scenarios. To overcome these challenges, attention mechanisms are employed to concentrate on interesting parts of an object to extract better discriminative features. This paper introduces local-global vision transformer (LoGoViT) for object re-identification by learning a hierarchical-level representation from fine-grained (local) to general (global) context features. It comprises two components: (i) shift and shuffle operations generate robust local features, and (ii) local-global module which aggregates the multi-level hierarchy features of an object. Extensive experiments show that our method achieves state-of-the-art on ReID benchmarks. We further investigate effective augmentation operations and discuss how patch modifications can help the model generalize under occlusion. Our code is available at https://github.com/nguyenphan99/LoGoViT

Định danh

https://vinspace.edu.vn/handle/VIN/566

Collections

Nguyen Do Trung Chanh, PhD [11]