dc.contributor.author | Dey, S | |
dc.contributor.author | Dutta, A | |
dc.contributor.author | Ghosh, SK | |
dc.contributor.author | Valveny, E | |
dc.contributor.author | Lladós, J | |
dc.contributor.author | Pal, U | |
dc.date.accessioned | 2019-10-15T09:07:45Z | |
dc.date.issued | 2019-06-02 | |
dc.description.abstract | In this paper we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. Our architecture also relies on a salient object detection through a supervised LSTM-based visual attention model learned from convolutional features. Both the alignment between the queries and the image and the supervision of the attention on the images are obtained by generalizing the Hungarian Algorithm using different loss functions. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. We validate the performance of our approach on standard single/multi-object datasets, showing state-of-the art performance in every dataset. | en_GB |
dc.description.sponsorship | European Union Horizon 2020 | en_GB |
dc.description.sponsorship | CERCA Program of Generalitat de Catalunya | en_GB |
dc.identifier.citation | Vol. 11362, pp. 241 - 255 | en_GB |
dc.identifier.doi | 10.1007/978-3-030-20890-5_16 | |
dc.identifier.grantnumber | 665919 | en_GB |
dc.identifier.grantnumber | TIN2015-70924-C2-2-R | en_GB |
dc.identifier.grantnumber | TIN2014-52072-P | en_GB |
dc.identifier.uri | http://hdl.handle.net/10871/39196 | |
dc.language.iso | en | en_GB |
dc.publisher | Springer Verlag | en_GB |
dc.rights | © Springer Nature Switzerland AG 2019 | en_GB |
dc.title | Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework | en_GB |
dc.type | Conference paper | en_GB |
dc.date.available | 2019-10-15T09:07:45Z | |
dc.identifier.isbn | 9783030208899 | |
dc.identifier.issn | 0302-9743 | |
dc.description | This is the author accepted manuscript. The final version is available from Springer Verlag via the DOI in this record | en_GB |
dc.description | ACCV 2018:
14th Asian Conference on Computer Vision, Perth, Australia, 2-6 December 2018 | |
dc.identifier.journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | en_GB |
dc.rights.uri | http://www.rioxx.net/licenses/all-rights-reserved | en_GB |
rioxxterms.version | AM | en_GB |
rioxxterms.licenseref.startdate | 2019-06-02 | |
rioxxterms.type | Conference Paper/Proceeding/Abstract | en_GB |
refterms.dateFCD | 2019-10-15T09:03:18Z | |
refterms.versionFCD | AM | |
refterms.dateFOA | 2019-10-15T09:07:51Z | |
refterms.panel | B | en_GB |