Improving neural networks by preventing co-adaptation of feature detectors

Hinton, Geoffrey E.; Srivastava, Nitish; Krizhevsky, Alex; Sutskever, Ilya; Salakhutdinov, Ruslan R.

Improving neural networks by preventing co-adaptation of feature detectors

Jul 3, 2012

e-Print:

1207.0580 [cs.NE]

View in:

ADS Abstract Service

pdf

reference search156 citations

Citations per year

Abstract: (submitter)

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.

References(20)

Figures(10)

[1]

Learning representations by back-propagating errors

- Nature 323 (1986) 6088, 533-536
•
DOI:
- 10.1038/323533a0

[2]

G.E. Hinton

- Neural Comput. 14 (2002) 1771

[3]

L.M.G.D.C. Ciresan
,
U. Meier
,
J. Schmidhuber

- Neural Comput. 22 (2010) 3207

[4]

Y.B.Y. Lecun
,
L. Bottou
,
P. Haffner

- IEEE Proc. 86 (1998) 2278

[5]

G.E. Hinton
,
R. Salakhutdinov

- Science 313 (2006) 504

[6]

Transactions on Audio, Speech, and Language Processing, 20, 14

A. Mohamed
,
G. Dahl
,
G. Hinton

[7]

Transactions on Audio, Speech, and Language Processing, 20, 30

G. Dahl
,
D. Yu
,
L. Deng
,
A. Acero

[8]

An Application OF Pretrained Deep Neural Networks To Large Vocabulary Conversational Speech Recognition, Tech. Rep. 001, Department of Computer Science, University of Toronto

N. Jaitly
,
P. Nguyen
,
A. Senior
,
V. Vanhoucke

[9]

Learning multiple layers of features from tiny images, Tech. Rep. 001, Department of Computer Science, University of Toronto

A. Krizhevsky

[10]

ICML , pp. 921-928

A. Coates
,
A.Y. Ng

[11]

CVPR09

J. Deng

[12]

CVPR11

J. Sanchez
,
F. Perronnin

[13]

S.J.N.R.A. Jacobs
,
M.I. Jordan
,
G.E. Hinton

- Neural Comput. 3 (1991) 79

[14]

Bayesian Learning for Neural Networks, Lecture Notes in Statistics No. 118

R.M. Neal

[15]

Bagging Predictors

Leo Breiman
(
- UC, Berkeley
)

- Machine Learning 24 (1996) 123-140
•
DOI:
- 10.1007/BF00058655

[16]

Random Forests

Leo Breiman
(
- UC, Berkeley
)

- Machine Learning 45 (2001) 1, 5-32
•
DOI:
- 10.1023/A:1010933404324

[17]

J.D.A. Livnat
,
C. Papadimitriou
,
M.W. Feldman

- Proc.Nat.Acad.Sci. 105 (2008) 19803

[18]

Artificial Intelligence and Statistics

R.R. Salakhutdinov
,
G.E. Hinton

[19]

Journal of

D.D. Lewis
,
T.G.R.Y. Yang

- Machine Learning 5 (2004) 361

[20]

We thank

N. Jaitly