On stochastic optimization and the Adam optimizer: Divergence, convergence rates, and acceleration techniques
gradient descent optimization in the training of deep neural networks, arXiv:2503.01660 (2025), 42 pages. [4] A. Jentzen & A. Riekert, Non-convergence to global minimizers for Adam and stochastic gradient descent …