References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2025.50.3.029

1991

Гибридная система обучения агентов с использованием A2C и эволюционных стратегий

Hybrid agent training system using A2C and evolutionary strategies

Корчагин

Алексей Павлович

Korchagin

Aleksei Pavlovich

aleksei.korchagin200@mail.ru aff-1

Воронежский государственный университет Voronezh State University

01 01 2026

1 1

10.26102/2310-6018/2025.50.3.029

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

Актуальность исследования обусловлена необходимостью повышения эффективности обучения агентов в условиях частичной наблюдаемости и ограниченного взаимодействия, характерных для многих реальных задач в мультиагентных системах. В связи с этим данная статья направлена на разработку и анализ гибридного подхода к обучению агентов, сочетающего преимущества градиентных и эволюционных методов. Ведущим методом исследования является модифицированный алгоритм Advantage Actor-Critic (A2C), дополненный элементами эволюционного обучения – кроссовером и мутацией параметров нейросети. Такой подход позволяет комплексно рассмотреть проблему адаптации агентов в условиях ограниченного обзора и кооперативного взаимодействия. В статье представлены результаты экспериментов в среде с двумя кооперативными агентами, задачей которых является извлечение и доставка ресурсов. Показано, что гибридная методика обучения обеспечивает значительный рост эффективности поведения агентов по сравнению с чисто градиентными подходами. Динамика среднего вознаграждения свидетельствует об устойчивости метода и его потенциале в более сложных сценариях многоагентного взаимодействия. Материалы статьи представляют практическую ценность для специалистов в области обучения с подкреплением, разработки мультиагентных систем и построения адаптивных кооперативных стратегий в условиях ограниченной информации.

The relevance of the study is due to the need to increase the efficiency of agent training under conditions of partial observability and limited interaction, which are typical for many real-world tasks in multiagent systems. In this regard, the present article is aimed at the development and analysis of a hybrid approach to agent training that combines the advantages of gradient-based and evolutionary methods. The main method of the study is a modified Advantage Actor-Critic (A2C) algorithm, supplemented with elements of evolutionary learning — crossover and mutation of neural network parameters. This approach allows for a comprehensive consideration of the problem of agent adaptation in conditions of limited observation and cooperative interaction. The article presents the results of experiments in an environment with two cooperative agents tasked with extracting and delivering resources. It is shown that the hybrid training method provides a significant increase in the effectiveness of agent behavior compared to purely gradient-based approaches. The dynamics of the average reward confirm the stability of the method and its potential for more complex multiagent interaction scenarios. The materials of the article have practical value for specialists in the fields of reinforcement learning, multi-agent system development, and the design of adaptive cooperative strategies under limited information.

обучение с подкреплением эволюционные алгоритмы многоагентная система A2C LSTM кооперативное обучение

reinforcement learning evolutionary algorithms multiagent system A2C LSTM cooperative learning

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

Yadav A., Kumar A., Choudhary Ch. Integrated Swarm Intelligence Framework for Dynamic Traffic Optimization in Delhi: A Three-Layer PSO-Fuzzy-MAS Approach. International Scientific Journal of Engineering and Management. 2025;04(05). https://doi.org/10.55041/ISJEM03921

Icarte-Ahumada G., He Zh., Godoy V., García F., Oyarzún M. A Multi-Agent System for Parking Allocation: An Approach to Allocate Parking Spaces. Electronics. 2025;14(5). https://doi.org/10.3390/electronics14050840

Dey S., Munsi A., Pradhan S., Aditya K. Bidirectional Wireless System for Drone to Drone Opportunity Charging in a Multi Agent System. In: 2023 International Conference on Control, Communication and Computing (ICCC), 19–21 May 2023, Thiruvananthapuram, India. IEEE; 2023. P. 1–5. https://doi.org/10.1109/ICCC57789.2023.10164995

Souli N., Kolios P., Ellinas G. Multi-Agent System for Rogue Drone Interception. IEEE Robotics and Automation Letters. 2023;8(4):2221–2228. https://doi.org/10.1109/LRA.2023.3245412

Sanghi N. Deep Q-Learning (DQN). In: Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models. Berkeley: Apress; 2024. P. 225–271. https://doi.org/10.1007/979-8-8688-0273-7_6

Jeungthanasirigool W., Sirimaskasem Th., Boonraksa T., Boonraksa P. Comparison of PPO-DRL and A2C-DRL Algorithms for MPPT in Photovoltaic Systems via Buck-Boost Converter. International Journal of Innovative Research and Scientific Studies. 2025;8(3):2438–2453. https://doi.org/10.53894/ijirss.v8i3.7022

Вel Rio A., Jimenez D., Serrano J. Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments. IEEE Access. 2024;12:146795–146806. https://doi.org/10.1109/ACCESS.2024.3472473

Chen T.-Yo., Chen W.-N., Hao J.-K., Wang Ya., Zhang J. Multi-Agent Evolution Strategy with Cooperative and Cumulative Step Adaptation for Black-Box Distributed Optimization. IEEE Transactions on Evolutionary Computation. 2025. https://doi.org/10.1109/TEVC.2025.3525713

Hochreiter S., Schmidhuber J. Long Short-Term Memory. Neural Computation. 1997;9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Kingma D.P., Ba J. Adam: A Method for Stochastic Optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), 07–09 May 2015, San Diego, CA, USA. 2015. URL: https://arxiv.org/abs/1412.6980

The authors declare that there are no conflicts of interest present.