Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework

Dey, S; Dutta, A; Ghosh, SK; Valveny, E; Lladós, J; Pal, U

dc.contributor.author	Dey, S
dc.contributor.author	Dutta, A
dc.contributor.author	Ghosh, SK
dc.contributor.author	Valveny, E
dc.contributor.author	Lladós, J
dc.contributor.author	Pal, U
dc.date.accessioned	2019-10-15T09:07:45Z
dc.date.issued	2019-06-02
dc.description.abstract	In this paper we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. Our architecture also relies on a salient object detection through a supervised LSTM-based visual attention model learned from convolutional features. Both the alignment between the queries and the image and the supervision of the attention on the images are obtained by generalizing the Hungarian Algorithm using different loss functions. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. We validate the performance of our approach on standard single/multi-object datasets, showing state-of-the art performance in every dataset.	en_GB
dc.description.sponsorship	European Union Horizon 2020	en_GB
dc.description.sponsorship	CERCA Program of Generalitat de Catalunya	en_GB
dc.identifier.citation	Vol. 11362, pp. 241 - 255	en_GB
dc.identifier.doi	10.1007/978-3-030-20890-5_16
dc.identifier.grantnumber	665919	en_GB
dc.identifier.grantnumber	TIN2015-70924-C2-2-R	en_GB
dc.identifier.grantnumber	TIN2014-52072-P	en_GB
dc.identifier.uri	http://hdl.handle.net/10871/39196
dc.language.iso	en	en_GB
dc.publisher	Springer Verlag	en_GB
dc.rights	© Springer Nature Switzerland AG 2019	en_GB
dc.title	Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework	en_GB
dc.type	Conference paper	en_GB
dc.date.available	2019-10-15T09:07:45Z
dc.identifier.isbn	9783030208899
dc.identifier.issn	0302-9743
dc.description	This is the author accepted manuscript. The final version is available from Springer Verlag via the DOI in this record	en_GB
dc.description	ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, 2-6 December 2018
dc.identifier.journal	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_GB
dc.rights.uri	http://www.rioxx.net/licenses/all-rights-reserved	en_GB
rioxxterms.version	AM	en_GB
rioxxterms.licenseref.startdate	2019-06-02
rioxxterms.type	Conference Paper/Proceeding/Abstract	en_GB
refterms.dateFCD	2019-10-15T09:03:18Z
refterms.versionFCD	AM
refterms.dateFOA	2019-10-15T09:07:51Z
refterms.panel	B	en_GB

Files in this item

Name:: ACCV2018_MMIR.pdf
Size:: 12.42Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science

Show simple item record

Show Statistical Information