Articles in this Volume

Research Article Open Access
Optimization study of pruning strategy based on KNN trajectory similarity query
With the development of GPS positioning technology, a large amount of spatiotemporal trajectory data has been generated. Due to the complex structure of trajectory data, which has irregular spatial shapes and continuous temporal sequence attributes, querying massive trajectory data poses certain challenges. The pruning strategy of existing trajectory similarity query methods is not very effective. Even after pruning operations, there are still a large number of trajectories that need to undergo distance calculations to confirm whether they are similar trajectories. This paper proposes multiple local pruning optimization schemes, which maximally reduce the number of trajectories in the candidate set. Specifically, it starts with region pruning in the index space, eliminating index spaces with distances greater than the maximum similarity distance from the query trajectory. Then, it performs distance pruning between trajectories, removing trajectories that do not meet the distance conditions. Finally, it adds a termination condition: when the distance between the current index space and the query trajectory is greater than the maximum similarity distance and the number of trajectories in the result set is K, the loop is exited, and the query process ends. Tests on real datasets demonstrate that the KTSS method outperforms current algorithms of the same type.
Show more
Read Article PDF
Cite
Research Article Open Access
Enhancing conversational recommendation systems through the integration of KNN with ConLinUCB contextual bandits
Article thumbnail
In recommender system research, contextual multi-armed bandits have shown promise in delivering tailored recommendations by utilizing contextual data. However, their effectiveness is often curtailed by the cold start problem, arising from the lack of initial user data. This necessitates extensive exploration to ascertain user preferences, consequently impeding the speed of learning. The advent of conversational recommendation systems offers a solution. Through these systems, the conversational contextual bandit algorithm swiftly learns user preferences for specific key-terms via interactive dialogues, thereby enhancing the learning pace. Despite these advancements, there are limitations in current methodologies. A primary issue is the suboptimal integration of data from key-term-centric dialogues and arm-level recommendations, which could otherwise expedite the learning process. Another crucial aspect is the strategic suggestion of exploratory key phrases. These phrases are essential in quickly uncovering users’ potential interests in various domains, thus accelerating the convergence of accurate user preference models. Addressing these challenges, the ConLinUCB framework emerges as a groundbreaking solution. It ingeniously combines feedback from both arm-level and key-term-level interactions, significantly optimizing the learning trajectory. Building upon this, the framework integrates a K-nearest neighbour (KNN) approach to refine key-term selection and arm recommendations. This integration hinges on the similarity of user preferences, further hastening the convergence of the parameter vectors.
Show more
Read Article PDF
Cite
Research Article Open Access
Comparative analysis and applications of classic multi-armed bandit algorithms and their variants
Article thumbnail
The multi-armed bandit problem, a pivotal aspect of Reinforcement Learning (RL), presents a classic dilemma in sequential decision-making, balancing exploration with exploitation. Renowned bandit algorithms like Explore-Then-Commit, Epsilon-Greedy, SoftMax, Upper Confidence Bound (UCB), and Thompson Sampling have demonstrated efficacy in addressing this issue. Nevertheless, each algorithm exhibits unique strengths and weaknesses, necessitating a detailed comparative evaluation. This paper executes a series of implementations of various established bandit algorithms and their derivatives, aiming to assess their stability and efficacy. The study engages in empirical analysis utilizing a real dataset, generating charts and data for a thorough examination of the pros and cons associated with each algorithm. A significant aspect of the research focuses on the parameter sensitivity of these algorithms and the impact of parameter tuning on their performance. Findings reveal that the SoftMax algorithm's effectiveness is markedly influenced by the initial estimated mean reward value for each arm. Conversely, algorithms like Epsilon-Greedy and UCB exhibit enhanced performance with optimal parameter settings. Furthermore, the study explores the limitations inherent in classic bandit algorithms and introduces some innovative models and methodologies pertinent to the multi-armed bandit problem, along with their applications.
Show more
Read Article PDF
Cite
Research Article Open Access
Strategic insights from multi-armed bandits: Applications in real-time strategy games
Article thumbnail
In real-time strategy games, players often face uncertainty regarding which strategy will lead to victory. This paper delves into how multi-armed bandit (MAB) algorithms can assist in this context, beginning with an exploration of MAB's theoretical principles, particularly the crucial balance between exploration and exploitation. The study compares the efficacy of the Explore-Then-Commit (ETC), Upper Confidence Bound (UCB), and Thompson Sampling (TS) algorithms through practical experimentation. Beyond gaming, the paper also considers the broader implications of MAB algorithms in healthcare, finance, and dynamic pricing within online retail sectors. A focal point of the research is the application of UCB and TS algorithms in StarCraft, a popular real-time strategy game. The performance of these algorithms is rigorously evaluated by calculating the cumulative regret value, a key metric in assessing strategic effectiveness. The findings suggest that the implementation of UCB and TS algorithms significantly enhances players' winning rates in the game. While the results are promising, the paper acknowledges ongoing challenges and encourages further exploration into this fascinating and valuable area of study. This research not only contributes to the understanding of strategic decision-making in gaming but also signals potential cross-sectoral applications of MAB algorithms.
Show more
Read Article PDF
Cite
Research Article Open Access
Optimizing video click-through rates with bandit algorithms
Article thumbnail
In recent years, videos have increasingly influenced public perception, making video platforms a focal point of digital consumption. One critical challenge for platform operators is identifying videos that resonate most with users, as user ratings directly reflect viewer preferences and experiences. This study explores the use of bandit algorithms to predict and strategize the overall ratings of various anime videos on the Bilibili platform. Bandit algorithms, a subset of the multi-armed bandit model, dynamically adjust selection strategies based on prior feedback to maximize cumulative rewards. Our empirical research assessed multiple gambling algorithms, including the ε-greedy, Upper Confidence Bound (UCB), Explore-then-Commit (ETC), and Thompson Sampling (TS) algorithms. The findings indicate that the Thompson Sampling algorithm, in particular, achieved the lowest cumulative regret in selecting optimal videos on the Bilibili platform, showcasing its superior performance. This study highlights the potential of bandit algorithms to enhance video selection processes, ensuring that platforms can effectively cater to user preferences and enhance viewer satisfaction.
Show more
Read Article PDF
Cite
Research Article Open Access
Enhancing movie recommendations through comparative analysis of UCB algorithm variants
Article thumbnail
In the digital realm, recommendation systems are pivotal in shaping user experiences on online platforms, tailoring content based on user feedback. A notable algorithm in this domain is the multi-armed bandit algorithm, with the Upper Confidence Bound (UCB) emerging as a classic and effective variant. This paper delves into an array of Upper Confidence Bound algorithm variations, encompassing UCB1, Asymptotically Optimal UCB, UCB-V, and UCB1Tuned. The research harnesses the MovieLens dataset to assess the performance of these algorithms, employing cumulative regret as the primary metric. For l in UCB1 and c in UCB-V, both oversized and undersized parameters will result in negative outcomes. And UCB1Tuned outperforms the other three algorithms in this experiment, since it considers variance and adjusts parameters dynamically. The study demonstrates that setting a appropriate UCB index is crucial for enhancing the performance of the UCB algorithm in recommendation system. It holds significance for both improve recommendation system algorithms and enhance user experience.
Show more
Read Article PDF
Cite
Research Article Open Access
Applying Multi-Armed Bandit algorithms for music recommendations at Spotify
Article thumbnail
This study explores the application of multi-armed bandit algorithms in enhancing music recommendation systems, with a focus on Spotify. It delves into the Explore-Then-Commit (ETC), Upper Confidence Bound (UCB), and Thompson Sampling (TS) algorithms, evaluating their efficacy within the Spotify context. The primary objective is to determine which algorithm optimally balances exploration and exploitation to maximize user satisfaction and engagement. The research reveals that the ETC algorithm, with its rigid exploration and exploitation phases, incurs a notably higher regret value. This rigidity can lead to missed opportunities in identifying optimal choices and hinder adaptability to evolving user preferences. Conversely, the UCB and TS algorithms exhibit remarkable adaptability and a flexible balance between exploration and exploitation. This flexibility translates into more personalized and satisfactory user experiences in music recommendations. However, the selection of the most appropriate algorithm should be contingent on the size and characteristics of the specific user dataset, as well as the fine-tuning of algorithm parameters to align with user preferences and behaviors.
Show more
Read Article PDF
Cite
Research Article Open Access
Advancing decision-making strategies through a comprehensive study of Multi-Armed Bandit algorithms and applications
Multi-Armed Bandit (MAB) strategies play a pivotal role in decision-making algorithms by adeptly managing the exploration-exploitation trade-off in environments characterized by multiple options and constrained resources. This paper delves into the core MAB algorithms, including Explore-Then-Commit (ETC), Thompson Sampling, and Upper Confidence Bound (UCB). It provides a detailed examination of their theoretical underpinnings and their application across diverse sectors such as recommender systems, healthcare, and finance. MAB algorithms are celebrated for their efficiency in optimizing decision outcomes; however, they are not without challenges. Significant issues include managing the complexity of exploration and adapting to non-stationary environments where the dynamics of the available options may change over time. A nuanced understanding of these challenges is crucial for effectively implementing MAB strategies in complex decision-making scenarios. This study not only highlights the versatility and potential of MAB algorithms but also underscores the need for ongoing research to refine these techniques and expand their applicability.
Show more
Read Article PDF
Cite
Research Article Open Access
Performance variance in Multi-Armed Bandits: In-depth analysis of three core algorithms
Article thumbnail
In real-time strategy games, players grapple with uncertainty regarding the best strategy for victory. This paper delves into multi-armed bandit (MAB) algorithms as potential solutions. The theoretical foundations of MAB are explored, with a focus on the crucial balance between exploration and exploitation. An experimental comparison of the Explore-Then-Commit (ETC), Upper Confidence Bound (UCB), and Thompson Sampling (TS) algorithms is conducted, showcasing their varied performance. Beyond gaming, the paper also examines the broader applications of MAB algorithms in fields such as healthcare, finance, and dynamic pricing in online retail, highlighting their versatility. A significant portion of the study is dedicated to implementing the UCB and TS algorithms in StarCraft, a popular real-time strategy game. The performance of these algorithms is assessed by calculating cumulative regret values, a metric critical to understanding their effectiveness in decision-making contexts. The results indicate that both UCB and TS algorithms substantially improve players' win rates in StarCraft. However, the study acknowledges existing challenges and the need for further research in this area. The use of MAB algorithms in complex, dynamic environments like real-time strategy games presents a rich avenue for exploration and holds significant promise for enhancing decision-making strategies in diverse domains. This research, therefore, not only contributes to the understanding of MAB algorithms in gaming but also underscores their potential in various other sectors.
Show more
Read Article PDF
Cite
Research Article Open Access
Optimizing decision-making in uncertain environments through analysis of stochastic stationary Multi-Armed Bandit algorithms
Article thumbnail
Reinforcement learning traditionally plays a pivotal role in artificial intelligence and various practical applications, focusing on the interaction between an agent and its environment. Within this broad field, the multi-armed bandit (MAB) problem represents a specific subset, characterized by a sequential interaction between a learner and an environment where the agent’s actions do not alter the environment or reward distributions. MABs are prevalent in recommendation systems and advertising and are increasingly applied in sectors like agriculture and adaptive clinical trials. The stochastic stationary bandit problem, a fundamental category of MAB, is the primary focus of this article. Here, we delve into the implementation and analytical comparison of several key bandit algorithms—including Explore-then-Commit (ETC), Upper Confidence Bound (UCB), Thompson Sampling (TS), Epsilon-Greedy (ɛ-Greedy), SoftMax, and Conservative Lower Confidence Bound (CON-LCB)—across various datasets. These datasets vary in the number of options (arms), reward distributions, and specific parameters, offering a broad testing ground. Additionally, this work provides an overview of the diverse applications of bandit problems across different fields, highlighting their versatility and broad impact.
Show more
Read Article PDF
Cite