Volume 107 | Applied and Computational Engineering

Research Article Open Access

Published 29 November 2024 DOI: 10.54254/2755-2721/107/2024BJ0071

Enhancing Artistic Style Transfer: Integrating CycleGAN, Diffusion Models, and Neural Painting for Monet-Inspired Image Generation

Yuening Li, Runxuan Yu, Sparsh Amarnani, Sunny Bhatt, Yushan Xie

This project explores various approaches to style transfer, specifically focusing on transforming natural images into the iconic style of Claude Monet’s paintings. We explored three methods: CycleGAN-based, neural painting-based, and diffusion- based. CycleGAN enables style transfer between domains without paired training data, neural painting simulates the physical painting process, and diffusion models leverage a denoising process for high-quality results. To evaluate the effectiveness of these approaches, we conduct a survey assessing the aesthetic appeal, naturalness, and adherence to Monet’s style of the generated images. Our analysis provides insights into the strengths and limitations of each method and identifies areas for future improvement in reproducing Monet’s iconic style.

Read Article PDF

Cite

Research Article Open Access

Published 26 November 2024 DOI: 10.54254/2755-2721/107/20241116

Exploiting Convolutional Recurrent Neural Networks for Enhanced EEG-based Emotion Recognition

Gengyu Li

Emotion recognition is a branch of artificial intelligence that analyzes human emotional states through facial expressions, voice, or physiological signals. It enhances human-computer interaction, facilitating more personalized and empathetic technology experiences, crucial for fields like mental health, customer service, and human-robot interaction. In recent years, research on emotion recognition using these tools has grown rapidly, involving multiple interdisciplinary fields. With the aid of electroencephalogram (EEG)-based brain-computer interfaces (BCIs), the emotional states of users can be sensed and analyzed. It offers a direct, non-intrusive insight into user emotions, enhancing user experience and system responsiveness. This approach is crucial for developing adaptive artificial intelligence (AI) in fields like healthcare for personalized treatments and in entertainment for immersive experiences, advancing human-technology symbiosis. This paper compares five current machine learning (ML)-based emotion recognition methods leveraging EEG signals, aiming to evaluate their effectiveness and applicability in emotion recognition. The paper concludes that while both Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) have their strengths, the combination of them provides the best performance in EEG-based emotion recognition.

Read Article PDF

Cite

Research Article Open Access

Published 26 November 2024 DOI: 10.54254/2755-2721/107/20241072

Design and Analysis of Miller Compensated Two-Stage Operational Amplifier

Yuheng Zou

With the development of integrated circuit technology, the size of electronic equipment continues to reduce, and speed continues to increase. Operational amplifiers are the key circuit in analog ICs, and the required performance of operational amplifiers is also increasingly high, research and design of high-performance operational amplifiers has become one of the key topics of today's research. A CMOS two-stage operational amplifier with high unity-gain bandwidth is analysed and designed with an appropriate Miller-compensation technique in this research to improve the frequency characteristics and stability. The design is based on 180nm CMOS process. Its performance is simulated and analysed in Cadence software environment. The performance parameters are shown by simulation results that the amplifier has a gain of 61.81dB, a GB (unity-gain bandwidth) of 907.42MHz and a phase margin of 75.02°. The experimental results show that it achieves high unity-gain bandwidth while ensuring gain and stability. Through this study, the principles of two-stage operational amplifiers and their compensation as well as the design and simulation are shown in detail in both theoretical and experimental sections.

Read Article PDF

Cite

Research Article Open Access

Published 26 November 2024 DOI: 10.54254/2755-2721/107/20241490

Research on Image Feature Extraction Based on Convolutional Neural Network

Siqi Xiang

This article explores the impact of various hyperparameters on the performance of image feature extraction using convolutional neural networks (CNNs), with a focus on learning rate, dropout rate, batch size, and the number of epochs. Using the CIFAR-10 dataset, extensive experiments were conducted to optimize these parameters, aiming to achieve high accuracy while avoiding overfitting. The findings underscore the importance of carefully selecting these hyperparameters to balance training efficiency and model performance. Through a rigorous analysis of the effects of these hyperparameters on model performance under various configurations, including training accuracy, test accuracy, training loss, and test loss, our experimental results indicate that the model achieves optimal performance with a learning rate of 0.0001 and a dropout rate of 0.5. The model demonstrates optimal performance in avoiding overfitting when the number of training epochs is set to 10. Additionally, although batch size has a relatively minor effect on overall model optimization, a slight improvement in performance was observed when the batch size was set to 32.

Read Article PDF

Cite

Research Article Open Access

Published 26 November 2024 DOI: 10.54254/2755-2721/107/20241180

Automatic Tracking Control Based on Bayesian Optimization

Zhenyue Xu

This paper discusses the control problem of intelligent tracking vehicle in complex dynamic environment, and puts forward a solution combining Bayesian optimization algorithm and PID control. Traditional PID control faces many challenges when dealing with time-varying and nonlinear systems, but the operating environment of intelligent tracking vehicles is complex and changeable, and it is difficult to adjust parameters in real time to adapt to different operating conditions. Therefore, this paper introduces Bayesian optimization algorithm to predict the optimal parameter combination of PID controller by constructing hyperparameter optimization model, thus effectively improving the control performance and robustness of the system.

Read Article PDF

Cite

Research Article Open Access

Published 26 November 2024 DOI: 10.54254/2755-2721/107/20241094

Advancing Image Animation: A Comparative Analysis of GAN-Based and Diffusion-Based Models

Yikai Sun

This paper provides an in-depth analysis of the latest advancements in image animation, focusing on two prominent models: Motion Representations for Articulated Animation (MRAA) and MagicAnimate. MRAA revolutionizes Generative Adversarial Networks (GANs)-based animation by employing regional descriptors instead of traditional key points, significantly enhancing the accuracy of motion capture and segmentation for complex articulated movements. MagicAnimate, on the other hand, utilizes a diffusion-based framework with temporal attention mechanisms, ensuring high fidelity and temporal consistency across animated sequences. The paper discusses the methodologies, datasets, and preprocessing techniques used in these models, offering a thorough comparison of their performance metrics, on various benchmark datasets. Through this comparative analysis, the paper highlights the strengths and limitations of these cutting-edge technologies, emphasizing MRAA’s superior handling of complex movements and background dynamics, and MagicAnimate’s excellence in identity preservation and temporal coherence. The study concludes by proposing future research directions, such as developing hybrid models that combine the advantages of GANs and diffusion techniques, to further enhance the realism, versatility, and control of image animation systems.

Read Article PDF

Cite

Research Article Open Access

Published 26 November 2024 DOI: 10.54254/2755-2721/107/20241191

In-depth Study and Application Analysis of Multimodal Emotion Recognition Methods: Multidimensional Fusion Techniques Based on Vision, Speech, and Text

Yuke Lei

Emotion recognition technology, pivotal in fields such as medical health, game entertainment, and human-computer interaction, benefits significantly from multimodal approaches. This paper delves into the techniques and applications of multimodal emotion recognition, focusing on fusion methods that integrate visual, speech, and text data. Emotion recognition through single modalities often faces limitations such as susceptibility to noise and low accuracy, whereas multimodal systems exhibit enhanced performance by leveraging combined data sources. The primary fusion techniques discussed include feature-level, decision-level, and model-level integrations. Feature-level fusion amalgamates multiple data types early in the processing stage, improving detection robustness. Decision-level fusion, on the other hand, involves synthesizing results from separate analyses, offering flexibility and ease of integration. Model-level fusion allows for deep interactions between modalities, potentially capturing more complex emotional states. This study confirms that multimodal emotion recognition systems generally surpass the performance of their single-modal counterparts, advocating for further exploration into sophisticated fusion techniques to boost accuracy and applicability across various domains.

Read Article PDF

Cite

Research Article Open Access

Published 26 November 2024 DOI: 10.54254/2755-2721/107/20241355

Enhanced Personalized Text Generation Using User Embeddings and Attention Mechanisms

Chang Shu

Personalized text generation plays an important role in modern applications such as content recommendation, conversational agents, and review generation, where adapting outputs to user-specific preferences enhances engagement. This research proposes an enhanced model architecture that leverages user-specific features, subreddit characteristics, and bidirectional LSTMs to improve personalization in text generation tasks. Building upon a baseline sequence-to-sequence model, the improved model incorporates user embeddings, self-attention mechanisms, and residual connections for richer context understanding. Additionally, features such as post count and average score are integrated to capture user behavior and preferences. The model was trained and tested on a large-scale Reddit dataset, with results showing significant improvements in both accuracy and the relevance of generated text. The final architecture achieved 83% validation accuracy. It produces more coherent and contextually appropriate outputs compared to the baseline. Future work will focus on refining feature engineering and enhancing the model’s ability to generate even more personalized and dynamic content, including multi-modal data such as images or user engagement over time.

Read Article PDF

Cite

Research Article Open Access

Published 26 November 2024 DOI: 10.54254/2755-2721/107/20241399

Enhancing Real-Time Vision Systems: Integrating Dynamic Vision Sensors with Vision Transformers to Increase Computational Efficiency

Annabelle Yao, Oliver Su, Sumedha Kumar

Optimizing and understanding vision systems' computational accuracy is crucial as they become increasingly integrated into daily life. Dynamic Vision Sensors (DVS) and Vision Transformers (ViTs) lead computer vision technology with efficient object recognition and image processing. However, DVS data's high computational complexity poses a problem for its real-time implementations. Merging these technologies can enhance vision system performance in dynamic environments and optimize real-time DVS processing. In our work, we use a ViT architecture to classify the DVS 128 dataset and compare our results with existing works using SNNs. We analyze how our method affects accuracy and loss, experimenting with different DVS-to-ViT input patch sizes. Our results show that the large patch size of 32x32 pixels has better accuracy and smaller loss than the 4x4 pixel patches as epochs increase. Our method also achieved a high 98.4% accuracy and low 0.22 loss within five epochs, significantly outperforming previous works averaging 93.13% accuracy over more epochs. These results highlight ViTs' large potential for real-time DVS data classification in applications that require high accuracy, like autonomous vehicles and surveillance systems.

Read Article PDF

Cite

Research Article Open Access

Published 26 November 2024 DOI: 10.54254/2755-2721/107/20241098

Enhancing DF-GAN for Text-to-Image Synthesis: Improved Text-Encoding and Network Structure

Yixuan Wu, Zhaonan Zhou

Text-to-image synthesis is one of the most challenging and popular tasks in machine learning, with many models developed to improve performance in this area. Deep Fusion Generative Adversarial Networks (DF-GAN) is a straightforward but efficient model for image generation, but it has three key limitations. First, it only supports sentence-level textual descriptions, restricting its ability to extract fine-grained features from word-level inputs. Second, the structure of the residual layers and blocks, along with key parameters, could be optimized for better performance. Third, existing evaluation metrics, such as Fréchet Inception Distance (FID), tend to place undue emphasis on irrelevant features like background, which is problematic when the focus is on generating specific objects. To address these issues, we introduced a new text encoder that enhances the model having capacity to process word-level descriptions, leading to more precise and text-consistent image generation. Additionally, we optimized key parameters and redesigned the convolutional and residual network structures, resulting in higher-quality images and reduced running time. Lastly, we proposed a new evaluation theory tailored to assess the quality of specific objects within the generated images. These improvements make the enhanced DF-GAN more effective in generating high-quality, text-aligned images efficiently.

Read Article PDF

Cite

Articles in this Volume