Volume | Applied and Computational Engineering

Research Article Open Access

Published 11 December 2023 DOI: 10.54254/2755-2721/27/20230065

Review on audit data visualization method based on R language

WeiXuan Ding

To be competitive in today's market, companies must constantly assess and improve their operations, and financial data analysis has emerged as a crucial tool for this purpose. Executives can utilize data analysis to gain a deeper understanding of the underlying facts in their data and make better informed decisions about their company and the market. Multiple data models in the language make it possible to have a fully functional language environment with tools for statistical analysis and visual visualization of data. With its ability to enhance the quality of work in statistical computations and graphical analysis, the R language is ideally suited to the industrial data analysis environment. In this paper, we use a literature review methodology to examine the domestic and international literature on R language big data visualization auditing, analyze the visualization application areas of R language big data auditing, discuss the benefits and challenges of big data auditing, demonstrate how R language can be adapted to real-world auditing scenarios, and offer reasonable recommendations and optimistic outlooks for the field's future.

Read Article PDF

Cite

Research Article Open Access

Published 11 December 2023 DOI: 10.54254/2755-2721/27/20230094

The virtual face modeling method based on user facial recognition and Unreal Engine 5 MetaHuman Creator

Yuxiang Wang

The face model based on the real face is a technique with hugely increased demands in recent years, The methods for the modelling are still in an early stage and only used in research and business fields. As the increased demands, areas other than business and research will require this technique. However, the cost of professional modelling is unaffordable for individual users. Therefore, through theoretical analysis and a literature review, this paper illustrates the possibility of combining face recognition technology with Metahuman Creator for face modeling so as to achieve the goal of reducing the cost of face scanning and popularizing it among individual users.

Read Article PDF

Cite

Research Article Open Access

Published 11 December 2023 DOI: 10.54254/2755-2721/27/20230101

Application test of PCA based improved K-means clustering algorithm in analyzing NGO assistance needs in less developed countries

Linhui Qin

In today's society where the amount of data is increasing by Peta Byte (PB) or Exa Byte (EB), it is an era of big data explosion, but there are also some unlabeled data or unstructured data. Compared with complex supervised learning, unmarked unsupervised learning has great potential and value in social development. The clustering algorithm K-means is one of the commonly used algorithms in unsupervised learning. However, after studying the shortcomings of K-means itself, a problem is found that the dimension attribute of the data set must be converted into a numeric type by means of arithmetic average to measure the distance. Different random selection will have a certain degree of influence on the final clustering results, and eventually lead to the decision deviation is too large. Especially for high noise points, multidimensional, nonlinear social big data. In order to solve this problem, the theme of this paper is the application test of PCA based improved K-means clustering algorithm in analyzing NGO assistance needs in less developed countries. First, read and clean up the national data of 167 less developed countries. Secondly, data visualization and data preparation are carried out to re-scale. The principal component analysis algorithm is used to analyze and deal with outliers. Clustering trends are analyzed by combining a k-means model determined by scores obtained from the Hopkins statistical test with a list of countries ultimately in need of assistance. Finally, it can be tested that PCA data cleaning can effectively reduce data noise and improve the clustering effect.

Read Article PDF

Cite

Research Article Open Access

Published 11 December 2023 DOI: 10.54254/2755-2721/27/20230111

Performances evaluation of machine learning models on income forecasting

Ziyang Wan

Job seekers, especially those who are looking for their first job, often lack sufficient experience and guidance, which makes it difficult for them to obtain satisfactory salaries. Therefore, salary prediction is very important. For individuals, income ranges can be estimated; for companies, the use of such estimates can guide salary adjustments for employees and prevent the loss of talented personnel, increase company revenue, and reduce operating costs; for governments or countries, these estimates can provide a macro-level assessment of overall income for a large area, such as predicting GDP per capita in a city, making it easier to make economic adjustments and grasp macro development trends. This article uses three models: decision trees, random forests, and neural networks, to train relevant datasets. The dataset is Adult Income Dataset from Kaggle. A total of 32,561 adults are included, including 15 items of data including age, education level, occupation, marital status, working hours per week, and others. The training and test sets were divided into a 7:3 ratio, and the predictive result of each model was evaluated through following figures: accuracy, recall rate, and F1 score. The final conclusion was that the random forest model had the best performance. There is an inseparable relationship between residents' income and the development and happiness of individuals and social stability.

Read Article PDF

Cite

Research Article Open Access

Published 11 December 2023 DOI: 10.54254/2755-2721/27/20230119

Predicting consumer acceptance of automobiles based on deep learning and traditional machine learning algorithms

Linrang Yang

Researchers have made significant progress in machine learning in recent years. Machine learning can learn and predict large and complex data sets. Researchers have divided machine learning algorithms into two categories: deep learning and traditional machine learning. Every problem can be predicted in both ways. This paper uses the "Car Data" dataset to investigate deep learning and traditional machine learning. In order to find a machine learning algorithm that is more conducive to analyzing and predicting consumers' acceptance of different cars, this paper mainly explores the differences in the prediction accuracy of the three methods of Neural Networks, Random Forest and Support Vector Machine (SVM). We construct 3-hidden layers neural networks and 4-hidden layers neural networks. After testing, it is known that the result predicted by Random Forest is the worst. The prediction accuracy of 3-hidden layers Neural Networks is similar to that by SVM. When we added an extra layer of hidden layers on the basis of 3-hidden layers, the prediction accuracy was higher than that of SVM. Adding a hidden layer can improve the prediction accuracy, and both SVM and Neural Network can be used to analyze Car Data. But not all methods have similar predictive accuracy.

Read Article PDF

Cite

Research Article Open Access

Published 11 December 2023 DOI: 10.54254/2755-2721/27/20230133

Features of realized volatility analysis and return predicting based on LGBM and RNN model

Zekun An, Kexin Jiang, Jacob Runxi Zheng

This paper proposes a method of predicting the realized volatility of financial assets using LGBM and RNN models. The study utilizes Convolutional Neural Networks to construct sub-indicators capturing the liquidity and volatility of financial assets. These sub-indicators are used to develop comprehensive measures of liquidity and volatility. Lognormal random walk theory is applied to each asset dimension to price volatility for multiple assets, and the value of European options independent of path is obtained via multiple integration. Monte Carlo method is applied to solve the integral, which becomes inefficient in the case of high dimension and orthogonality. This study also involves leveraging LGB and other models to efficiently exploit data to create high returns and achieve the highest sharp ratio. The current dataset, which comes from a recognized international market maker, includes stock market data that is important for trade execution in the financial markets, particularly snapshots of the order book and executed trades. The study shows that the proposed method can accurately predict the realized volatility of financial assets.

Read Article PDF

Cite

Research Article Open Access

Published 11 December 2023 DOI: 10.54254/2755-2721/27/20230146

Research on improvements of fraud detection system: basing on improved machine learning algorithms

Zhiding Zhang

Nowadays, commercial fraud behaviors commonly occur in many industries. However, due to obstacles like concept drift, imbalanced dataset and uneven distribution of fraud entries, Fraud Detection System (FDS) fails to identify such behaviors. Among the problems mentioned above, most research focus on dealing with skewed dataset. This paper first presents common application scenarios of FDS which consist of credit card fraud, insurance fraud and supply chain fraud. Then, this study introduces five representative methods in dealing with problems mentioned above, which are K Nearest Neighbors-Synthetic Minority Oversampling Technique-Long Short-term Memory Networks (kNN-SMOTE-LSTM), Generative Adversarial Nets-AdaBoost-Decision tree (GAN-AdaBoost-DT), Wasserstein GAN-Kernel Density Estimation-Gradient Boosting DT (WGAN-KDE-GBDT), Time-LSTM (TLSTM) and Adaptive Synthetic Sampling-Sequential Forward Selection-Random Forest (ADASYN-SFS-RF). KNN-SMOTE-LSTM adopts KNN as an identifying classifier so as to only retain true samples. GAN-AdaBoost-DT generates new samples without referring to real transactions. WGAN-KDE-GBDT uses Wasserstein Distance as distance measurement instead, and thus improves training speed and guarantees successful generation. TLSTM tires to consider the weights of different time intervals and measures the similarity between the simulated behavior and the genuine behavior. ADASYN-SFS-RF employs SFS algorithm, basing on RF, to only reserve optimal subsets of features. Finally, result metrics prove that those improved algorithms do improve the overall performance of FDS, even if with limitations at some indicators.

Read Article PDF

Cite

Research Article Open Access

Published 11 December 2023 DOI: 10.54254/2755-2721/27/20230148

A review on machine learning methods for intrusion detection system

Man Ni

With the increasing access to the Internet and the development of information technology, concerns about computer security have been raised on a considerably large scale. Computer crimes contain various methods to undermine information privacy and system integrity, causing millions to trillions lose in the past few years. It is urgent to improve the security algorithms and models to perform as a thorough structure to prevent attacks. Among this prevention structure, an intrusion detection system (IDS) has played a vital role to monitor and detect malicious behaviours. However, due to the rapidly increasing variety of threats, the traditional algorithms are not sufficient, and new methods should be brought into IDS to improve the functionality. Deep learning (DL) and Machine learning (ML) are newly developed programs which can process data on a considerably large scale. They can also make decisions and predictions without specific programming, and these features are suitable to improve and enhance the IDS. This article mainly focuses on a review of ML methods used in IDS construction.

Read Article PDF

Cite

Research Article Open Access

Published 11 December 2023 DOI: 10.54254/2755-2721/27/20230164

An AI-based ambulatory ankle brace with wearable sensor used for preventing ankle sprains

Gao Zimu, Zheng Xu

Ankle sprain is one of the most common injuries in the game of basketball. The ankle sprain may bring tremendous time and cost loss, and patients with a history of ankle sprain are susceptible to further ankle injuries. This paper proposes an AI-based ambulatory ankle brace with wearable sensors that can be used for ankle-sprain prevention. The equipment consists a sensor, a microcomputer, a Bluetooth module, and a muscle stimulator. Ten volunteers performed twelve basketball moves with the ankle brace on, and the twelve basketball moves were labeled as high-risk and low-risk. The sensor on the ankle brace measured the 3-dimensional angular velocity and angular displacement of the subject’s ankle in real-time, and the data were then fed to different machine learning algorithms to create models to predict future ankle motions. The model with the best performance created by the Random Forest algorithm was imported into the microcomputer. Once the model predicts a high-risk move, the microcomputer sends a Bluetooth signal to the muscle stimulator. The one end of the stimulator is a pair of electrodes attached to the peroneal muscles to restrict ankle motion. When the stimulator receives the “high-risk” signal, it’s activated and the spraining motion would be alleviated. In this way, the ankle brace doesn’t restrict normal ankle movement while providing adequate protection for potential ankle sprain cases.

Read Article PDF

Cite

Research Article Open Access

Published 11 December 2023 DOI: 10.54254/2755-2721/27/20230173

Scheme for improving RAFT-based blockchain performance

Li Xiao, Songqiao Yang, Tianxing Zhang

Improving RAFT-based blockchain systems' performance and their relationship with environmental factors is a lack of concern in recent studies and is essential in the production environment. To attain this, it is necessary to analyze the performance, especially the blockchain system's throughput, latency, and robustness. The two most widely used RAFT-based platforms, etcd, and Hyperledger Fabric's evaluation, were conducted to discover the factors influencing the system's performance and promoting methods. The evaluation focused mainly on throughput, latency, and robustness, including evaluating the reading and writing process, changing the number of keys, connections, and clients in etcd, and comparing the process and the two platforms. The only number of clients significantly impacts etcd's performance, and etcd's performance is better than Hyperledger Fabric's. Besides, both two platform shows that reading performs better than writing. So, to improve the system's performance, controlling the number of clients and focusing on the writing process is the key.

Read Article PDF

Cite

Articles in this Volume