Deep learning generalizes because the parameter-function map is biased towards simple functions

Pérez, GV; Camargo, CQ; Louis, AA

dc.contributor.author	Pérez, GV
dc.contributor.author	Camargo, CQ
dc.contributor.author	Louis, AA
dc.date.accessioned	2021-01-05T13:35:06Z
dc.date.issued	2019-05-09
dc.description.abstract	Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong bias in a model DNN for Boolean functions, as well as in much larger fully conected and convolutional networks trained on CIFAR10 and MNIST. As the target functions in many real problems are expected to be highly structured, this intrinsic simplicity bias helps explain why deep networks generalize well on real world problems. This picture also facilitates a novel PAC-Bayes approach where the prior is taken over the DNN input-output function space, rather than the more conventional prior over parameter space. If we assume that the training algorithm samples parameters close to uniformly within the zero-error region then the PAC-Bayes theorem can be used to guarantee good expected generalization for target functions producing high-likelihood training sets. By exploiting recently discovered connections between DNNs and Gaussian processes to estimate the marginal likelihood, we produce relatively tight generalization PAC-Bayes error bounds which correlate well with the true error on realistic datasets such as MNIST and CIFAR10and for architectures including convolutional and fully connected networks.	en_GB
dc.identifier.citation	ICLR 2019: Seventh International Conference on Learning Representations, 6 - 9 May 2019, New Orleans, Louisiana, US	en_GB
dc.identifier.uri	http://hdl.handle.net/10871/124307
dc.language.iso	en	en_GB
dc.publisher	ICLR	en_GB
dc.relation.url	https://iclr.cc/Conferences/2019/Schedule	en_GB
dc.rights	© 2019 ICLR	en_GB
dc.subject	generalization	en_GB
dc.subject	deep learning theory	en_GB
dc.subject	PAC-Bayes	en_GB
dc.subject	Gaussian processes	en_GB
dc.subject	parameter-function map	en_GB
dc.subject	simplicity bias	en_GB
dc.title	Deep learning generalizes because the parameter-function map is biased towards simple functions	en_GB
dc.type	Conference paper	en_GB
dc.date.available	2021-01-05T13:35:06Z
dc.description	This is the final version. Available from ICLR via the link in this record	en_GB
dc.rights.uri	http://www.rioxx.net/licenses/all-rights-reserved	en_GB
rioxxterms.version	VoR	en_GB
rioxxterms.licenseref.startdate	2019-05-09
rioxxterms.type	Conference Paper/Proceeding/Abstract	en_GB
refterms.dateFCD	2021-01-05T13:31:17Z
refterms.versionFCD	VoR
refterms.dateFOA	2021-01-05T13:35:12Z
refterms.panel	B	en_GB

Files in this item

Name:: deep_learning_generalizes_beca ...
Size:: 3.299Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science

Show simple item record

Show Statistical Information