Attention in Computer Vision

He, S

dc.contributor.author	He, S
dc.date.accessioned	2021-07-28T08:41:09Z
dc.date.issued	2021-08-02
dc.description.abstract	Thanks to deep learning, computer vision has advanced by a large margin. Attention mechanism, inspired from human vision system and acts as a versatile module or mechanism that widely applied in the current deep computer vision models, strengthens the power of deep models. However, most attention models have been trained end-to-end. Why and how those attention models work? How similar is the trained attention to the human attention where it was inspired? Those questions are still unknown to us, which thus hinders us to design a better attention model, architecture or algorithm that can further advance the computer vision field. In this thesis, we aim to unravel the mysterious attention models by studying attention mechanisms in computer vision during the deep learning era. In the first part of this thesis, we study bottom-up attention. Under the umbrella of saliency prediction, bottom-up attention has progressed a lot with the help of deep learning. However, the deep saliency models are still a black box to us and their performance has reached a ceiling. Therefore, the first part of this thesis aims to understand what happened inside the deep models when it is trained for saliency prediction. Concretely, this thesis dissected each individual unit inside a deep model that has been trained for saliency prediction. Our analysis discloses the secrets of deep models for saliency prediction as well as their limitations, and give new insights for future saliency modelling. In the second part, we study top-down attention in computer vision. Top-down attention, a mechanism usually builds on top of bottom-up attention, has achieved great success in a lot of computer vision tasks. However, their success raised an interesting question, namely, ``are those learned top-down attention similar to human attention under the same task?''. To answer this question, we have collected a dataset which recorded human attention under the image captioning task. Using our collected dataset, we analyse what is the difference between attention exploited by a deep model for image captioning and human attention under the same task. Our research shows that current widely used soft attention mechanism is different from human attention under the same task. In the meanwhile, we use human attention, as a prior knowledge, to help machine to perform better in the image captioning task. In the third part, we study contextual attention. It is a complementary part to both bottom-up and top-down attention, which contextualizes each informative region with attention. Prior contextual attention methods either adopt the contextual module in natural language processing that is only suitable for 1-D sequential inputs or complex two stream graph neural networks. Motivated by the difference of semantic units between sentences and images, we designed a transformer based architecture for image captioning. Our design widens original transformer layer by using the 2-D spatial relationship and achieves competitive performance for image captioning.	en_GB
dc.identifier.uri	http://hdl.handle.net/10871/126588
dc.publisher	University of Exeter	en_GB
dc.rights.embargoreason	Some extended works are under review.	en_GB
dc.title	Attention in Computer Vision	en_GB
dc.type	Thesis or dissertation	en_GB
dc.date.available	2021-07-28T08:41:09Z
dc.contributor.advisor	Pugeault, N	en_GB
dc.publisher.department	Computer Sciences	en_GB
dc.rights.uri	http://www.rioxx.net/licenses/all-rights-reserved	en_GB
dc.type.degreetitle	PhD in Computer Sciences	en_GB
dc.type.qualificationlevel	Doctoral	en_GB
dc.type.qualificationname	Doctoral Thesis	en_GB
rioxxterms.version	NA	en_GB
rioxxterms.licenseref.startdate	2021-07-27
rioxxterms.type	Thesis	en_GB
refterms.dateFOA	2021-07-28T08:44:34Z

Files in this item

Name:: HeS.pdf
Size:: 16.41Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record

Show Statistical Information