With the continuous scaling of deep learning models, increasingly stringent requirements have been imposed on computing efficiency, energy consumption control and computing architectures during the training and inference phases. The synergistic relationship between optimization algorithms and computing hardware has gradually become a pivotal research direction in the field of artificial intelligence. This paper conducts a systematic comparative study on deep learning optimization algorithms and hardware acceleration architectures, focusing on analyzing the performance discrepancies between gradient descent-based optimization methods and Adam and its improved variants across different computing platforms. By adopting the methodologies of literature review and comparative analysis, this paper explores typical optimization algorithms in terms of their convergence characteristics, stability, computational complexity and memory access patterns, and further integrates the analysis with the high parallel computing capability of GPUs as well as the dataflow computing paradigm and energy efficiency advantages of FPGAs. Special attention is paid to the applicability of different optimization algorithms in large-scale model training, convolutional neural networks, as well as embedded and energy-constrained application scenarios. The results demonstrate that adaptive optimization algorithms such as Adam exhibit faster convergence rates and stronger robustness in complex models and large-scale training tasks, whereas gradient descent-based methods still retain distinct advantages in application scenarios with relatively simple structures or limited computing resources.
Research Article
Open Access