University of Exeter
Browse

Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning

Download (3.34 MB)
conference contribution
posted on 2025-08-01, 12:50 authored by F Alamri, A Dutta
Zero-Shot Learning (ZSL) aims to recognise unseen object classes, which are not observed during the training phase. The existing body of works on ZSL mostly relies on pretrained visual features and lacks the explicit attribute localisation mechanism on images. In this work, we propose an attention-based model in the problem settings of ZSL to learn attributes useful for unseen class recognition. Our method uses an attention mechanism adapted from Vision Transformer to capture and learn discriminative attributes by splitting images into small patches. We conduct experiments on three popular ZSL benchmarks (i.e., AWA2, CUB and SUN) and set new state-of-the-art harmonic mean results {on all the three datasets}, which illustrate the effectiveness of our proposed method.

Funding

Alan Turing Institute

Defence Science and Technology Laboratory

History

Related Materials

Rights

© 2021 Irish Pattern Recognition and Classification Society

Notes

This is the author accepted mansucript.

Publisher

Irish Pattern Recognition and Classification Society

Version

  • Accepted Manuscript

Language

en

FCD date

2021-07-31T12:02:20Z

FOA date

2021-09-03T23:00:00Z

Citation

IMVIP 2021: Irish Machine Vision and Image Processing Conference, 1 - 3 September 2021, Dublin, Ireland

Department

  • Computer Science

Usage metrics

    University of Exeter

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC