Automated learning rate search using batch-level cross-validation

Duygu Kabakçı; Emre Akbaş

doi:10.35377/saucis...935353

EN

Automated learning rate search using batch-level cross-validation

Abstract

Deep learning researchers and practitioners have accumulated a significant amount of experience on training a wide variety of architectures on various datasets. However, given a network architecture and a dataset, obtaining the best model (i.e. the model giving the smallest test set error) while keeping the training time complexity low is still a challenging task. Hyper-parameters of deep neural networks, especially the learning rate and its (decay) schedule, highly affect the network's final performance. The general approach is to search the best learning rate and learning rate decay parameters within a cross-validation framework, a process that usually requires a significant amount of experimentation with extensive time cost. In classical cross-validation (CV), a random part of the dataset is reserved for the evaluation of model performance on unseen data. This technique is usually run multiple times to decide learning rate settings with random validation sets. In this paper, we explore batch-level cross-validation as an alternative to the classical dataset-level, hence macro, CV. The advantage of batch-level or micro CV methods is that the gradient computed during training is re-used to evaluate several different learning rates. We propose an algorithm based on micro CV and stochastic gradient descent with momentum, which produces a learning rate schedule during training by selecting a learning rate per epoch, automatically. In our algorithm, a random half of the current batch (of examples) is used for training and the other half is used for validating several different step sizes or learning rates. We conducted comprehensive experiments on three datasets (CIFAR10, SVHN and Adience) using three different network architectures (a custom CNN, ResNet and VGG) to compare the performances of our micro-CV algorithm and the widely used stochastic gradient descent with momentum in a early-stopping macro-CV setup. The results show that, our micro-CV algorithm achieves comparable test accuracy to macro-CV with a much lower computational cost.

Keywords

References

[1] K. Anand, Z. Wang, M. Loog, and J. van Gemert, “Black magic in deep learning: How human skill impacts network training,” in British Machine Vision Conference, 2020.
[2] R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, “Green ai,” Communications of the ACM, vol. 63, p. 54–63, Nov. 2020.
[3] G. E. Hinton, N. Srivastava, and K. Swersky, “Neural Networks for Machine Learning,” COURSERA: Neural Networks for Machine Learning, 2012.
[4] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic gradient descent,” in ICLR: International Conference on Learning Representations, 2015.
[5] “Neural networks (maybe) evolved to make adam the best optimizer – parameter-free learning and optimization algorithms.” https://parameterfree.com/2020/12/06/neural-network-maybe-evolved-to-make-adam-the-best-optimizer/. (Accessed on 03/01/2021).
[6] A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, and B. Recht, “The Marginal Value of Adaptive Gradient Methods in Machine Learning,” in Advances in Neural Information Processing Systems 30 (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), pp. 4148–4158, Curran Associates, Inc., 2017.
[7] J. Zhang, I. Mitliagkas, and C. R´e, “YellowFin and the Art of Momentum Tuning,” CoRR, vol. abs/1706.0, 2017.
[8] D. Kabakci, “Automated learning rate search using batch-level cross-validation,” Master’s thesis, Middle East Technical University, Ankara, Turkey, July 2019. https://open.metu.edu.tr/handle/11511/43629.

[9] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” The Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011.
[10] J. Bergstra and Y. Bengio, “Random Search for Hyper-Parameter Optimization,” Journal of Machine Learning Research, vol. 13, pp. 281–305, 2012.
[11] L. N. Smith, “Cyclical Learning Rates for Training Neural Networks,” 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 3 2017.
[12] L. N. Smith and N. Topin, “Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates,” CoRR, vol. abs/1708.0, 2017.
[13] I. Loshchilov and F. Hutter, “SGDR: Stochastic Gradient Descent with Warm Restarts,” in ICLR: International Conference on Learning Representations, pp. 1–16, 2017.
[14] T. Schaul, Z. Sixin, and Y. LeCun, “No More Pesky Learning Rates,” in Proceedings of the 30th International Conference on Machine Learning, 2013.
[15] A. G. Baydin, R. Cornish, D. M. Rubio, M. Schmidt, and F. Wood, “Online Learning Rate Adaptation with Hypergradient Descent,” in International Conference on Learning Representations, 2018.
[16] N. S. Keskar and R. Socher, “Improving generalization performance by switching from Adam to SGD,” arXiv preprint arXiv:1712.07628, 2017.
[17] M. Zaheer, S. Reddi, D. Sachan, S. Kale, and S. Kumar, “Adaptive Methods for Nonconvex Optimization,” in Advances in Neural Information Processing Systems 31 (S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds.), pp. 9793–9803, Curran Associates, Inc., 2018.
[18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[19] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017.
[20] J. Hu, L. Shen, and G. Sun, “Squeeze-and-Excitation Networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6 2018.
[21] Y. Huang, Y. Cheng, D. Chen, H. Lee, J. Ngiam, Q. Le, and Z. Chen, “Gpipe: Efficient training of giant neural networks using pipeline parallelism.,” in Advances in Neural Information Processing Systems (NIPS), 2019.
[22] S. Jenni and P. Favaro, “Deep bilevel learning,” in Proceedings of the European conference on computer vision (ECCV), pp. 618–633, 2018.
[23] B. T. Polyak, “Some methods of speeding up the convergence of iteration methods,” USSR Computational Mathematics and Mathematical Physics, vol. 4, no. 5, pp. 1–17, 1964.
[24] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.
[25] A. Krizhevsky, G. Hinton, et al., “Learning Multiple Layers of Features from Tiny Images,” tech. rep., Department of Computer Science, University of Toronto, 2009.
[26] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y Ng, “Reading Digits in Natural Images with Unsupervised Feature Learning,” in Advances in Neural Information Processing Systems (NIPS), 2011.
[27] E. Eidinger, R. Enbar, and T. Hassner, “Age and Gender Estimation of Unfiltered Faces,” IEEE Transactions on Information Forensics and Security, vol. 9, pp. 2170–2179, 12 2014.
[28] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in ICLR: International Conference on Learning Representations, pp. 1–14, 2015.

Details

Primary Language

English

Subjects

Artificial Intelligence

Journal Section

Research Article

Authors

Duygu Kabakçı
0000-0001-6636-813X
Türkiye

Emre Akbaş ^*
0000-0002-3760-6722
Türkiye

Publication Date

December 31, 2021

Submission Date

May 10, 2021

Acceptance Date

November 4, 2021

Published in Issue

Year 1970 Volume: 4 Number: 3

DOI

https://doi.org/10.35377/saucis...935353

IZ

https://izlik.org/JA45GL97SZ

Cite

RIS / Bibtex

APA

Kabakçı, D., & Akbaş, E. (2021). Automated learning rate search using batch-level cross-validation. Sakarya University Journal of Computer and Information Sciences, 4(3), 312-325. https://doi.org/10.35377/saucis...935353

AMA

1.Kabakçı D, Akbaş E. Automated learning rate search using batch-level cross-validation. SAUCIS. 2021;4(3):312-325. doi:10.35377/saucis.935353

Chicago

Kabakçı, Duygu, and Emre Akbaş. 2021. “Automated Learning Rate Search Using Batch-Level Cross-Validation”. Sakarya University Journal of Computer and Information Sciences 4 (3): 312-25. https://doi.org/10.35377/saucis. 935353.

EndNote

Kabakçı D, Akbaş E (December 1, 2021) Automated learning rate search using batch-level cross-validation. Sakarya University Journal of Computer and Information Sciences 4 3 312–325.

IEEE

[1]D. Kabakçı and E. Akbaş, “Automated learning rate search using batch-level cross-validation”, SAUCIS, vol. 4, no. 3, pp. 312–325, Dec. 2021, doi: 10.35377/saucis...935353.

ISNAD

Kabakçı, Duygu - Akbaş, Emre. “Automated Learning Rate Search Using Batch-Level Cross-Validation”. Sakarya University Journal of Computer and Information Sciences 4/3 (December 1, 2021): 312-325. https://doi.org/10.35377/saucis. 935353.

JAMA

1.Kabakçı D, Akbaş E. Automated learning rate search using batch-level cross-validation. SAUCIS. 2021;4:312–325.

MLA

Kabakçı, Duygu, and Emre Akbaş. “Automated Learning Rate Search Using Batch-Level Cross-Validation”. Sakarya University Journal of Computer and Information Sciences, vol. 4, no. 3, Dec. 2021, pp. 312-25, doi:10.35377/saucis. 935353.

Vancouver

1.Duygu Kabakçı, Emre Akbaş. Automated learning rate search using batch-level cross-validation. SAUCIS. 2021 Dec. 1;4(3):312-25. doi:10.35377/saucis. 935353