Basim A. Hassan Neural Network Training and Solution of Minimization Problems Via Novel Conjugate Gradient Methods

Basim A.  Hassan; Alaa Luqman Ibrahim; Thaair Ameen

doi:10.5269/bspm.79860

Basim A. Hassan
Alaa Luqman Ibrahim
Thaair Ameen

Abstract

In this study, we introduce a new conjugate gradient method designed to solve large-scale unconstrained optimization problems and train neural networks. The method was developed to satisfy the descent condition, ensuring both stability and efficiency. We also proved that the proposed method achieves global convergence under standard assumptions. To evaluate its effectiveness, we conducted numerical experiments on a variety of benchmark optimization problems, including test cases from the CUTE collection, as well as standard functions such as the penalty function, sine function, and Diagonal1 function. The performance was assessed based on the number of iterations, number of function evaluations, and computational time. The results clearly demonstrate that our proposed method outperforms the classical Hestenes-Stiefel (HS) method, offering faster convergence with fewer iterations and reduced computational cost. In addition, we applied the method to train neural networks and compared its performance with that of the standard conjugate gradient algorithm. These experiments, carried out using MATLAB and the Neural Network Toolbox, showed that our method significantly enhances the training efficiency, achieving a lower mean squared error in fewer epochs. Overall, the proposed conjugate gradient method offers a more effective and computationally efficient approach for solving unconstrained optimization problems and neural network training, making it a promising tool for future research and real-world machine-learning applications.

Downloads

Download data is not yet available.

References

References
[1] Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J. McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition (pp. 318–362). Cambridge, MA.
[2] Guo, Y., Kasihmuddin, M.S.M., Zamri, N.E., Li, J., Romli, N.A., Mansor, M.A., & Ruzai, W.N.A. (2025). Logic mining method via hybrid discrete Hopfield neural network. Computers & Industrial Engineering, 206, 111200. https://doi.org/10.1016/j.cie.2025.111200
[3] Romli, N.A., Jamaludin, S.Z.M., Kasihmuddin, M.S.M., Mansor, M.A., & Zamri, N.E. (2024). Modelling logic mining: A log-linear approach. AIP Conference Proceedings, 2895(1), 040002. https://doi.org/10.1063/5.0192155
[4] Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400–407.
[5] Loizou, N., Vaswani, S., Laradji, I., & Lacoste-Julien, S. (2021). Stochastic Polyak step size for SGD: An adaptive learning rate for fast convergence. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), 130, 1–11.
[6] Charalambous, C. (1992). Conjugate gradient algorithm for efficient training of artificial neural networks. IEEE Proceedings, 139(3), 301–310.
[7] Hassan, B.A., Moghrabi, I.A.R., Ibrahim, A.L., & Jabbar, H.N. (2025). Improved Conjugate Gradient Methods for Unconstrained Minimization Problems and Training Recurrent Neural Network. Engineering Reports, 7: e70019. https://doi.org/10.1002/eng2.70019
[8] Ibrahim, A.L., Fathi, B.G., & Abdulrazzaq, M.B. (2025). Improving three-term conjugate gradient methods for training artificial neural networks in accurate heart disease prediction. Neural Computing and Applications. https://doi.org/10.1007/s00521-025-11121-9
[9] Salleh, Z., & Alhawarat, A. (2016). An efficient modification of the Hestenes-Stiefel nonlinear conjugate gradient method with restart property. Journal of Inequalities and Applications, 2016(1), 110. https://doi.org/10.1186/s13660-016-1049-5
[10] Yuan, G., Wei, Z., & Zhao, Q. (2014). A modified Polak-Ribière-Polyak conjugate gradient algorithm for large-scale optimization problems. IIE Transactions (Institute of Industrial Engineers), 46(4), 397–413. https://doi.org/10.1080/0740817X.2012.726757
[11] Nocedal, J., & Wright, S.J. (1999). Numerical Optimization. Springer Series in Operations Research. Springer Verlag, New York.
[12] Hager, W., & Zhang, H. (2006). A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization, 2(1), 35–58.
[13] Andrei, N. (2007). Numerical comparison of conjugate gradient algorithms for unconstrained optimization. Studies in Informatics and Control, 16(4), 333–352.
[14] Hestenes, M., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409–436.
[15] Hager, W., & Zhang, H. (2005). A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM Journal on Optimization, 16(1), 170–192.
[16] Babaie-Kafaki, S. (2014). On the sufficient descent condition of the Hager-Zhang conjugate gradient methods. 4OR, 12(3), 285–292.
[17] Omar, D.H., Ibrahim, A.L., Hassan, M.M., Fathi, B.G., & Sulaiman, D.A. (2024). Enhanced Conjugate Gradient Method for Unconstrained Optimization and its Application in Neural Networks. European Journal of Pure and Applied Mathematics, 17(4), 2692–2705. https://doi.org/10.29020/nybg.ejpam.v17i4.5354
[18] Hassan, B.A., & Alashoor, H.A. (2023). On image restoration problems using new conjugate gradient methods. Indonesian Journal of Electrical Engineering and Computer Science, 29(3), 1438–1445.
[19] Hassan, B.A., & Sadiq, H.M. (2022). A new formula on the conjugate gradient method for removing impulse noise images. Bulletin of the South Ural State University. Series: Mathematical Modelling, Programming & Computer Software, 15(4), 123–130.
[20] Hassan, B.A., & Alashoor, H.A. (2022). A New Type Coefficient Conjugate on the Gradient Methods for Impulse Noise Removal in Images. European Journal of Pure and Applied Mathematics, 15(4), 2043–2053.
[21] Hassan, B.A., & Sadiq, H.M. (2022). A new formula on the conjugate gradient method for removing impulse noise images. Bulletin of the South Ural State University. Series: Mathematical Modelling, Programming & Computer Software (Bulletin SUSU MMCS), 15(4), 123–130.
[22] Ataee Tarzanagh, D., & Peyghami, M.R. (2015). A new regularized limited memory BFGS-type method based on modified secant conditions for unconstrained optimization problems. Journal of Global Optimization, 63, 709–728. https://doi.org/10.1007/s10898-015-0310-7
[23] Zoutendijk, G. (1970). Nonlinear programming computational methods. In J. Abadie (Ed.), Integer Nonlinear Programming (pp. 37–86). North-Holland, Amsterdam.
[24] Gould, N.I.M., Orban, D., & Toint, P.L. (2003). CUTEr and sifdec: A constrained and unconstrained testing environment, revisited. ACM Transactions on Mathematical Software, 29, 373–394. https://doi.org/10.1145/962437.962439
[25] Moré, J.J., Garbow, B.S., & Hillstrom, K.E. (1981). Testing unconstrained optimization software. ACM Transactions on Mathematical Software, 7, 17–41. https://doi.org/10.1145/355934.355936
[26] Andrei, N. (2008). An unconstrained optimization test functions collection. Advances in Modeling and Optimization, 10, 147–161.
[27] Dolan, E.D., & Moré, J.J. (2002). Benchmarking optimization software with performance profiles. Mathematical Programming, 91, 201–213. https://doi.org/10.1007/s101070100263