Volume 44 | Applied and Computational Engineering

Research Article Open Access

Published 5 March 2024 DOI: 10.54254/2755-2721/44/20230134

Application analysis of financial data mining in investment decision

Rongying Zeng

Amidst the escalating complexities that define the contemporary financial market and the rapid proliferation of information, traditional methods of formulating investment decisions confront increasingly formidable challenges. In response to these intricate dynamics, the realm of financial data mining has emerged as a prominent avenue of scholarly investigation within the investment domain. This paper's fundamental objective is to conduct a comprehensive retrospective analysis of the diverse applications of financial data mining in the context of investment decision-making.This scholarly pursuit entails a meticulous synthesis of existing academic inquiries, concurrently proposing potential avenues for future advancements in this field. By undertaking this academic endeavor, the paper strives to make substantive contributions to the refinement of methodologies essential for adeptly navigating the multifaceted landscape of modern investments. As the financial landscape continues to evolve, this study aspires to offer insights that not only enhance the efficacy of investment strategies but also foster a deeper understanding of the intricate interplay between data mining techniques and decision-making processes. Through the synthesis of empirical findings and theoretical perspectives, this paper seeks to underscore the pertinence of leveraging data-driven approaches in investment practices, thereby promoting a more informed and sophisticated investment landscape.

Read Article PDF

Cite

Research Article Open Access

Published 5 March 2024 DOI: 10.54254/2755-2721/44/20230074

Data analysis based on COVID-19—Important factors in the COVID-19 outbreak

Tianzuo Li, Yilin Pan

With multiple industries around the world receiving significant impact following the 2020 pandemic outbreak, the Covid-19 pandemic has highlighted the importance of health care systems in managing and containing infectious diseases. This article examines the relationship between the level of health care services and the number of Covid-19 infections, taking into account factors such as detection and contact tracing, case treatment and management, and resource constraints. While countries with stronger health care systems may be better able to respond to a pandemic, resource constraints and other factors may also play a role in determining infection rates. Overall, the relationship between health care and Covid-19 infections is complex and influenced by multiple factors, highlighting the need for sustained investment in health care infrastructure and systems. In this article, we aim to analyze which factors influence the number of infections in the New Crown outbreak. Our study shows that some factors are significantly associated with the number of infections in the epidemic, while certain factors are not significantly associated, and are considered to be consistent with survey expectations and make recommendations and outlooks.

Read Article PDF

Cite

Research Article Open Access

Published 5 March 2024 DOI: 10.54254/2755-2721/44/20230078

Predictive model on detecting ChatGPT responses against human responses

Zhaokai He, Ruolong Mao, Yu Liu

The paper investigates the critical differences between AI-generated text and human responses in terms of linguistic patterns, structure, and content. The research makes use of datasets from HC3, collected in 2023. Our results are that ChatGPT with GPT-3.5 is more likely to use words like conjunctions and combinations of words in conversations compared to humans systematically. Our model has high accuracy in identifying AI-generated answers.

Read Article PDF

Cite

Research Article Open Access

Published 5 March 2024 DOI: 10.54254/2755-2721/44/20230079

Sentiment analysis applied on Amazon reviews

Jiaqi Li, Qi Pan, Yihao Wang

With the rapid growth of e-commerce, accurately capturing buyers' sentiments through their reviews is increasingly vital for online marketplaces. In this paper, we aim to deal with sentiment analysis in these reviews by exploring effective methods to analyze them. We use a review dataset containing user ratings and comments on Amazon products. Applying the two-step methodology of data preprocessing and model building, we intend to employ models like LSTM and SVM to analyze Amazon customer reviews and gain insights into their performance. The findings of this study may also allow e-commerce platforms to provide better service to sellers and buyers.

Read Article PDF

Cite

Research Article Open Access

Published 5 March 2024 DOI: 10.54254/2755-2721/44/20230093

Applying self-attention model to learn both Empirical Risk Minimization and Invariant Risk Minimization for multimedia recommendation

Hanyu Zhao, Yangqi Huang, Kunqi Zhao, Sizhuo Wang

Multimedia recommendation systems have many applications in our daily life. However, how accurately capture a customer's preference is an issue that is difficult to deal with. The proposed Invariant Risk Minimization (IRM) and Empirical Risk Minimization (ERM) are ways to learn a customer's preference. Still, both frameworks show some limitations: although ERM performs excellently in a single environment, it fails to generalize well when faced with multiple and new domains. On the other hand, IRM learns invariant features across heterogeneous environments, but it lacks theoretical guarantees and performs less effectively where the invariants are unclear. This paper proposes an ERM and IRM Optimized Rating Framework (EIOR) as our final recommender model with direct rating scores. The EIOR enhances the accuracy and functionality of the multimedia recommendation systems by utilizing self-attention mechanisms to combine IRM and ERM with adjusted attention weights. Specifically, IRM learns invariant parts across different environments, while ERM learns variant parts. With self-attention, we can adaptively allocate attention weights for the two pieces and seek the optimal pair of attention weights based on the loss function. We demonstrate EIOR on a cutting-edge recommender model UltraGCN and use the open multimedia dataset of TikTok to finish all the experiments. The results validate the effectiveness of EIOR by comparing purely operating invariant representations alone with the framework of IRM.

Read Article PDF

Cite

Research Article Open Access

Published 5 March 2024 DOI: 10.54254/2755-2721/44/20230097

Principal Component Analysis variants for Parkinson datasets

Chen Cheng

Principal Component Analysis (PCA) is one of the most fundamental dimension reduction methods that need further research. With the widespread popularity of machine learning and the arrival of the era of big data, dimension reduction has become a hot topic and principal component analysis is a hot topic. However, although there are a lot of researchers who focus on the methods of the PCA, few researches on Parkinson Datasets have been made. As a result, the aim of our work is to discuss the PCA variants for Parkinson Datasets. This paper first introduces the three most commonly used PCA methods: PCA, Sparse PCA and Kernel PCA, and then introduces the Support Vector Machine (SVM) used to measure the dimension reduction effect. After that, we introduced the Parkinson's dataset and the meanings of root mean square error (RMSE), overall accuracy, Cohen’s kappa (Kappa) and computational time, the indicators that are used to measure the dimensionality reduction effect. Finally, we identified the variants among different PCA methods on the Parkinson dataset by comparing the indicators of the data obtained after dimensionality reduction using different methods.

Read Article PDF

Cite

Research Article Open Access

Published 5 March 2024 DOI: 10.54254/2755-2721/44/20230155

The comparison and analysis of Skip-gram and CBOW in creating financial sentimental dictionary

Xingjian Zhang, Lu Zhang

Textual analysis is increasingly used in various fields due to data availability, computing power, and machine learning techniques. In finance, sentiment analysis is essential for obtaining excess returns, and building domain-specific lexicons using word2vec is a prevalent method. The CBOW and Skip-gram algorithms have different predictive methodologies and performances depending on the task and dataset. This paper reviews financial sentiment analysis using a dictionary method and compares the performance of the two algorithms. CBOW trains faster than Skip-gram when dealing with a small amount of text data, but as the amount of data increases, Skip-gram becomes more efficient. Besides, the Skip-gram captures more synonyms of the selected words than CBOW.

Read Article PDF

Cite

Research Article Open Access

Published 5 March 2024 DOI: 10.54254/2755-2721/44/20230156

A new approach based on machine learning to certain diseases

Ruyi Teng, Tianqi Zhu, Shuyan Qiao

Never in history is the importance of data science so emphasized in modern society. Focusing on obtaining conclusive results from the implicit features concealed in a huge amount of data, data science plays a remarkable role in various fields, including modern medical practice. Although the fascinating performance of cutting-edge technology is capable of coping with numerous diseases, certain diseases, such as breast cancer and Parkinson’s disease, still compromise people’s health since these diseases are difficult to predict and prone to exacerbate. In order to deal with that problem, we will introduce three different machine learning methods in our experiment to two different data sets to test the performance of classification. In the paper, we clarified the principle of each machine learning method (three different classifiers) at first. Then, we conducted our experiment, during which decisive parameters of classifiers were set by specific searching algorithms. Besides, we introduced metrics along with their principles for the evaluation of the numerical results, which were obtained by different classifiers. In the next step, we discussed the results by comparing the values of the metrics that represent the performance of a particular method. Therefore, we managed to obtain optimal classifiers for the two datasets. In the final stage of the paper, we discussed our experiment’s limitations as well as prospects, which includes further application in other fields.

Read Article PDF

Cite

Research Article Open Access

Published 5 March 2024 DOI: 10.54254/2755-2721/44/20230241

Incorporating emotional trend into multi-emotion analysis models for long-text sentiment analysis

Yu Zhang

The role of sentiment analysis is vital in natural language processing(NLP) and has garnered significant attention across different domains. However, multi-emotion analysis in long-text is still a challenging task due to the intricate emotional nuances that are conveyed. In this paper, a novel approach for long-text multi-emotion analysis is proposed by integrating emotional trends. This integration aims to enhance the ability of the model to recognize emotions by including word-level sentiment scores as supplementary features. To achieve this, the ISEAR and IMDB datasets are leveraged to investigate the impact of sentiment scores with varying weights on three models: BiLSTM, CNN, and CNN+BiLSTM. The models are trained for 20 and 50 epochs and evaluated by accuracy, precision, recall, F1 score ROC curve and AUC value. The experimental results indicate that the incorporation can improve the processing speed of the multi-emotion analysis task while maintaining performance with a 66.7% probability. The highlighted improvement over the baseline model reduced the time by 33.42%. In the best case, the accuracy of the model increased by 2.26% and the F1 score increased by 2.16% without affecting the running speed.

Read Article PDF

Cite

Research Article Open Access

Published 5 March 2024 DOI: 10.54254/2755-2721/44/20230247

Enhancing recommendation with causal embedding: Considering social network influence

Zhuyiheng Chu, Pengxiang Zhang, Yibo Zhou, Shihan Huang, Yanze Guo

In the realm of recommendation models, we consistently rely on observational interaction data. This data encompasses a variety of aspects, such as user conformity and genuine user interests. The key challenge for recommender systems is to extract a user's authentic interests from this interaction data in order to provide accurate recommendations. The current method, DICE, attempts to separate conformity and interest by assigning distinct embeddings for each to users and items. The method ensures that each embedding captures only one causal factor through training with specific causal data. In our research, we've enhanced this existing method by incorporating social networks into the disentanglement of conformity and real interest from observational interaction data. The results from our proposed method surpass those of the prevailing baseline, demonstrating significant improvements across various backbone models using a real dataset. Furthermore, we conducted a sensitivity analysis and provided recommendations for scenarios in which our new model would be most effective.

Read Article PDF

Cite

Articles in this Volume