Volume 230 | Applied and Computational Engineering

Research Article Open Access

Published 30 March 2026 DOI: 10.54254/2755-2721/2026.AS32514

Comparative Study on the Synergy Between Deep Learning Optimization Algorithms and Hardware Acceleration Architectures

Longjie Hu

With the continuous scaling of deep learning models, increasingly stringent requirements have been imposed on computing efficiency, energy consumption control and computing architectures during the training and inference phases. The synergistic relationship between optimization algorithms and computing hardware has gradually become a pivotal research direction in the field of artificial intelligence. This paper conducts a systematic comparative study on deep learning optimization algorithms and hardware acceleration architectures, focusing on analyzing the performance discrepancies between gradient descent-based optimization methods and Adam and its improved variants across different computing platforms. By adopting the methodologies of literature review and comparative analysis, this paper explores typical optimization algorithms in terms of their convergence characteristics, stability, computational complexity and memory access patterns, and further integrates the analysis with the high parallel computing capability of GPUs as well as the dataflow computing paradigm and energy efficiency advantages of FPGAs. Special attention is paid to the applicability of different optimization algorithms in large-scale model training, convolutional neural networks, as well as embedded and energy-constrained application scenarios. The results demonstrate that adaptive optimization algorithms such as Adam exhibit faster convergence rates and stronger robustness in complex models and large-scale training tasks, whereas gradient descent-based methods still retain distinct advantages in application scenarios with relatively simple structures or limited computing resources.

Read Article PDF

Cite

Research Article Open Access

Published 30 March 2026 DOI: 10.54254/2755-2721/2026.AS32482

Discovering Latent Emotional Stages in Heartbreak-Related Social Media Texts via Unsupervised Representation Learning

Rui Huang

This study presents an unsupervised computational framework for discovering latent emotional stages from large-scale heartbreak-related social media texts. The proposed approach leverages representation learning to identify stage structures directly from real-world textual data. A total of 12,908 comments were collected, and 10,675 valid samples were retained after preprocessing. Text representations were extracted using BERT embeddings, followed by UMAP for dimensionality reduction and K-means clustering for stage partitioning. The optimal number of clusters was determined using the silhouette coefficient. Experimental results demonstrate clear structural separability (silhouette = 0.411) and significant inter-stage differences in emotional intensity (F = 364.24, p < 0.001, η² = 0.064), as well as systematic variations in temporal expression patterns. These findings indicate that unsupervised representation learning can effectively uncover psychologically interpretable stage structures from large-scale social media texts.

Read Article PDF

Cite

Research Article Open Access

Published 30 March 2026 DOI: 10.54254/2755-2721/2026.AS32450

Artificial Intelligence and Home Fall Detection: Challenges and Prospects

Jinyang Mu

As the world population is aging, fall detection has emerged a major technology in ensuring the safety of the elderly and the provision of rehabilitative services to them at home. Nevertheless, the current studies continue to encounter difficulties regarding the implementation of algorithms, data privacy, and the lack of data. To address this gap, this paper uses an exploratory review method to describe in a systematic manner the status quo in the research concerning artificial intelligence as applied in the area of home fall detection. The process of analyzing 11 chosen core articles demonstrates that the use of AI technology in the area of fall detection is typically represented in: posture estimation-driven visual recognition, wearable sensor-driven time-series analysis, and multimodal monitoring systems in the Internet of Things. It has been found that the objective of AI application in research is to enhance the accuracy and openness of clinical judgment using interpretable AI and advanced filtering procedures. The paper discusses the need to create low cost, high precision, and privacy-sensitive AI systems in the home care setting as well as present a broad view of the system design in the intelligent elder care services in the future.

Read Article PDF

Cite

Research Article Open Access

Published 30 March 2026 DOI: 10.54254/2755-2721/2026.AS32414

A Review of Transformer Models and Their Variants for Predictive Tasks

Kejun Shen

With an increase in task complexity (e.g., long horizon, high dimensions, multivariate and multimodal),However, traditional statistical models and recurrent neural networks are limited to the issues of scalability and modelling of longrange dependencies [The Transformer].which is based on the self-attention mechanism has been proved to be an effective alternative because of its parallel computing feature and capacity to capture global temporal dependency.The purpose of this section is to provide an overview about Transformers (and predictors based on it). We first summarize the basic building blocks of the vanilla Transformer and their modifications towards time series prediction. Next,efficient attention, multi-scale temporal modelling,cross-variable dependency learning, and multimodal fusion.We summarize typical uses in the context of univariate, multivariate, and multimodal forecast settings along with comparisons between their predictive performance and computation speed. Finally,major issues including the computational complexity, robustness to distributional shift and explainability are pointed out,and possible further research directions are given.The purpose of this survey is to offer an organized point of reference for the researcher or practitioner developing Transformersbased forecasting models.

Read Article PDF

Cite

Research Article Open Access

Published 30 March 2026 DOI: 10.54254/2755-2721/2026.AS32439

Building a Domain-Specialized Medical LLM: Pretraining and Multi-Stage Alignment (SFT, PPO, DPO, KTO) for Chinese Medicine

Junqi Yang

This project studies how far a small medical language model can be improved with a staged alignment pipeline under limited compute. Starting from Qwen3-0.6B, I first contin-ued pretraining on a Traditional Chinese Medicine (TCM) corpus to add domain vocabulary and clinical-style text patterns. I then trained an SFT model on Chinese TCM instruction-response data and built two follow-up branches: an RLHF-style branch with a reward model and PPO, and a preference-only branch with DPO followed by KTO. All models were evaluated on shared category-level scores from MedBench. In my ex-periments, the SFT model remained competitive on factual and reasoning-heavy tasks, while the DPO+KTO branch produced more conservative responses and showed better behavior on open-ended prompts. The PPO branch was harder to stabilize on a small backbone and did not consistently outperform SFT on the categories reported here. These results suggest that for a 0.6B medical model, preference alignment is useful, but its effect depends strongly on how the preference data are constructed and how much capacity the base model has.

Read Article PDF

Cite

Research Article Open Access

Published 30 March 2026 DOI: 10.54254/2755-2721/2026.AS32499

MultiPath Mobile Vision Transformer for Home-Deployed License Plate Recognition

Randy Zhu

License plate recognition (LPR) systems have been widely used for traffic violation monitoring, stolen vehicle detection, and business security purposes; however, they are rarely installed for homeowners due to some significant challenges. Cameras are typically mounted on private property (e.g., garage pillars, lawns, or windows), observe traffic at oblique angles (≈30–40°), need to recognize license plates at long standoff distances (≈100–150 ft), and must operate on low-cost edge hardware. With these constraints, plate crops are often small, motion-blurred, and perspective-distorted. Additionally, with edge hardware usually low on memory and computation power, existing optical character recognition (OCR) solutions either do not support such settings or give low or suboptimal accuracy. This paper presents a robust and efficient OCR model that is tailored for home-deployment settings - the MultiPath Mobile Vision Transformer (MobileViT). The proposed model adopts a MobileViT backbone that combines local feature extraction through convolutional neural network (CNN) layers with global context modeling using lightweight transformer encoders. This structure is well suited to data constrained and compute-constrained settings. A MultiPath, template-aware decoding head is then used to predict each character position independently based on the plate format. In edge deployment experiments, the proposed model achieves 88.59% plate-level accuracy and less than 40 milliseconds per-plate crop inference latency, and costs only 131MB of GPU memory. Evidence also shows that accuracy can exceed 95% as training data increases. Beyond license plate recognition, the proposed architecture illustrates a general approach to structured, data-limited vision tasks operating under strict memory, latency, and power constraints on embedded hardware.

Read Article PDF

Cite

Research Article Open Access

Published 30 March 2026 DOI: 10.54254/2755-2721/2026.AS32438

Comprehensive Analysis of Network Depth and Numerical Precision on MNIST Classification:Training Dynamics and Deployment Efficiency

Yuhao Bi

Today’s deep learning field is developing rapidly, and its focus is no longer limited to pursuing the ultimate accuracy of models on large servers, but paying more and more attention to how to improve model efficiency in order to deploy it to edge devices. In this study, we focus on how adjusting the two key knobs of network depth and numerical precision affects the performance of neural networks. We used the classic MNIST dataset as a test benchmark to examine fully connected neural networks with depths ranging from 2 to 10 layers.In the experimental part, we compared traditional FP32 training with FP16 mixed-precision training, and further studied the impact of introducing INT8 dynamic quantization after training. The results show that FP16 mixed-precision training is very effective: compared with the FP32 model, its accuracy loss is minimal (usually less than 0.2%) and can benefit from hardware acceleration. At the same time, in the simulated deployment experiment, we found that INT8 dynamic quantization can greatly reduce the model size by about 70–75%, while the decrease in accuracy is very limited (only about 0.5–1.2%). This result confirms our conjecture: in resource-constrained environments such as Internet of Things devices or mobile processors, reducing numerical precision is not only practical, but also an efficient means for large-scale applications.

Read Article PDF

Cite

Research Article Open Access

Published 30 March 2026 DOI: 10.54254/2755-2721/2026.AS32421

Traffic Signal Intelligent Control and System Optimization—Intelligent Traﬀic Signal Design Based on Machine Learning and Signal Sensors

Mingrui Yang

Fixed-time and conventional actuated sig- nal control often perform poorly under unbalanced de- mand, leading to long queues, wasted green time, and spillback risk. This paper proposes an intelligent traﬀic-signal control framework that adapts phase selection and green splits using multi-source intersection data. The system combines signal sensors with vision-based perception (YOLOv11n) to estimate traﬀic states for vehicles and pedestrians, and can incorporate GPS data from connected terminals to improve observability under occlu- sion. All measurements are transmitted through a wireless network and fused for real-time decision making. A state- aware controller ranks phase priorities and computes adaptive timing plans that target lower average delay and higher throughput while avoiding ineﬀicient all-red or idle-green intervals. We evaluate the approach in simulation on representative Shanghai intersection geometries under normal, warning, and emergency demand levels, and val- idate feasibility with an ESP32-based prototype. Results show reduced average vehicle delay and shorter queues compared with fixed-time baselines, indicating improved intersection eﬀiciency and robustness.

Read Article PDF

Cite

Research Article Open Access

Published 7 April 2026 DOI: 10.54254/2755-2721/2026.AS32535

Stage-Aware Sparse Attention, SASA

Yueling Zhang

In recent years, diffusion models have achieved remarkable progress in video generation. However, the three-dimensional full attention mechanism they rely on has a complexity ofO(N2), which seriously hinders inference efficiency. Most existing sparse attention methods adopt fixed patterns and fail to accommodate the dynamic changes in attention requirements across different stages of the diffusion process. To address this issue, this paper proposes a Stage-Aware Sparse Attention (SASA) method that dynamically adjusts the attention sparsity strategy based on denoising timesteps. Specifically, it preserves global information in the early stage to maintain structural consistency, balances global and local interactions in the middle stage, and focuses on local details in the late stage to improve computational efficiency. SASA does not require introducing additional parameters or retraining, and achieves efficient computation solely through stage-driven sparse scheduling. Theoretical analysis demonstrates that while maintaining stable generation quality, this method significantly reduces redundant attention computations, providing a new perspective for efficient video diffusion models.

Read Article PDF

Cite

Research Article Open Access

Published 6 May 2026 DOI: 10.54254/2755-2721/2026.BA33212

A Survey of Optimization Methods in Machine Learning: From Gradient Descent to Convex Optimization

Shizheng Song

Optimization plays a fundamental role in machine learning, as most learning tasks can be formulated as the minimization of a loss function. From classical gradient descent to modern convex optimization theory, optimization algorithms have continuously evolved to meet the demands of large-scale data and high-dimensional models. This paper reviews the development of optimization methods in machine learning, focusing on gradient descent and its variants, stochastic optimization, and convex optimization theory. Through literature analysis, this study examines the theoretical foundations, convergence properties, and practical applications of these methods. Specifically, the research addresses three key questions: how gradient-based methods have evolved, what advantages convex optimization provides, and what challenges arise in non-convex optimization. The paper concludes that convex optimization offers strong theoretical guarantees, while gradient-based algorithms dominate practical large-scale machine learning tasks, especially in deep learning.

Read Article PDF

Cite

Articles in this Volume