In recent years, algorithms in the field of computer vision have been continuously innovated and promoted, and the progress of small object detection has become a key task in the development of this field. However, compared with the detection of medium and large targets, factors such as background interference can easily interfere with the detection of small targets with smaller pixel coverage areas, making progress more difficult. In recent years, researchers have proposed various methods to address these challenges, and the three most representative frameworks are algorithms developed using YOLO, Transformer, and Diffusion models. This article provides a detailed overview and comparison of three models. The YOLO based method is superior in improving real-time detection through multi-scale feature enhancement, structural optimization, and adjusting the loss function. Based on the Transformer, the accuracy and precision of identifying small targets are improved by adjusting the mechanism, using a hybrid structure and multimodal feature fusion. And researchers will adjust the diffusion process, involving the construction of diffusion bounding boxes and diffusion engines, to enable the application of diffusion model algorithms. Finally, this article summarizes the advantages and limitations of these methods and discusses potential future research directions. The significance of this study lies in providing a unified overview of the three main research paradigms, helping researchers understand current progress, identify existing challenges, and explore new possibilities for advancing small object detection.
Research Article
Open Access