Efficient human vision inspired action recognition using adaptive spatiotemporal sampling
dc.contributor.author | Mac, C. Khoi Nguyen | |
dc.contributor.author | Do, N. Minh | |
dc.contributor.author | Vo, P. Minh | |
dc.date.accessioned | 2025-02-22T19:07:25Z | |
dc.date.available | 2025-02-22T19:07:25Z | |
dc.date.issued | 2022-07-14 | |
dc.identifier.uri | https://vinspace.edu.vn/handle/VIN/577 | |
dc.description.abstract | Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision and pre-attentive processing from the human visual perception mechanism, we introduce a novel adaptive spatiotemporal sampling scheme for efficient action recognition. Our system pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. We validate the system on EPIC-KITCHENS and UCF-101 datasets for action recognition, and show that our proposed approach can greatly speed up inference with a tolerable loss of accuracy compared with those from state-of-the-art baselines. Source code is available at https://github.com/knmac/adaptive_spatiotemporal. | en_US |
dc.language.iso | en_US | en_US |
dc.title | Efficient human vision inspired action recognition using adaptive spatiotemporal sampling | en_US |
dc.type | Article | en_US |
Files in this item
This item appears in the following Collection(s)
-
Minh Do, PhD. [6]
Honorary Vice Provost