<?xml version="1.0" encoding="UTF-8"?>
<article article-type="research-article" dtd-version="1.3" xml:lang="ru" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://metafora.rcsi.science/xsd_files/journal3.xsd">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">moitvivt</journal-id>
      <journal-title-group>
        <journal-title xml:lang="ru">Моделирование, оптимизация и информационные технологии</journal-title>
        <trans-title-group xml:lang="en">
          <trans-title>Modeling, Optimization and Information Technology</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2310-6018</issn>
      <publisher>
        <publisher-name>Издательство</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.26102/2310-6018/2025.50.3.029</article-id>
      <article-id pub-id-type="custom" custom-type="elpub">1991</article-id>
      <title-group>
        <article-title xml:lang="ru">Гибридная система обучения агентов с использованием A2C и эволюционных стратегий</article-title>
        <trans-title-group xml:lang="en">
          <trans-title>Hybrid agent training system using A2C and evolutionary strategies</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name-alternatives>
            <name name-style="eastern" xml:lang="ru">
              <surname>Корчагин</surname>
              <given-names>Алексей Павлович</given-names>
            </name>
            <name name-style="western" xml:lang="en">
              <surname>Korchagin</surname>
              <given-names>Aleksei Pavlovich</given-names>
            </name>
          </name-alternatives>
          <email>aleksei.korchagin200@mail.ru</email>
          <xref ref-type="aff">aff-1</xref>
        </contrib>
      </contrib-group>
      <aff-alternatives id="aff-1">
        <aff xml:lang="ru">Воронежский государственный университет</aff>
        <aff xml:lang="en">Voronezh State University</aff>
      </aff-alternatives>
      <pub-date pub-type="epub">
        <day>01</day>
        <month>01</month>
        <year>2026</year>
      </pub-date>
      <volume>1</volume>
      <issue>1</issue>
      <elocation-id>10.26102/2310-6018/2025.50.3.029</elocation-id>
      <permissions>
        <copyright-statement>Copyright © Авторы, 2026</copyright-statement>
        <copyright-year>2026</copyright-year>
        <license license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/">
          <license-p>This work is licensed under a Creative Commons Attribution 4.0 International License</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://moitvivt.ru/ru/journal/article?id=1991"/>
      <abstract xml:lang="ru">
        <p>Актуальность исследования обусловлена необходимостью повышения эффективности обучения агентов в условиях частичной наблюдаемости и ограниченного взаимодействия, характерных для многих реальных задач в мультиагентных системах. В связи с этим данная статья направлена на разработку и анализ гибридного подхода к обучению агентов, сочетающего преимущества градиентных и эволюционных методов. Ведущим методом исследования является модифицированный алгоритм Advantage Actor-Critic (A2C), дополненный элементами эволюционного обучения – кроссовером и мутацией параметров нейросети. Такой подход позволяет комплексно рассмотреть проблему адаптации агентов в условиях ограниченного обзора и кооперативного взаимодействия. В статье представлены результаты экспериментов в среде с двумя кооперативными агентами, задачей которых является извлечение и доставка ресурсов. Показано, что гибридная методика обучения обеспечивает значительный рост эффективности поведения агентов по сравнению с чисто градиентными подходами. Динамика среднего вознаграждения свидетельствует об устойчивости метода и его потенциале в более сложных сценариях многоагентного взаимодействия. Материалы статьи представляют практическую ценность для специалистов в области обучения с подкреплением, разработки мультиагентных систем и построения адаптивных кооперативных стратегий в условиях ограниченной информации.</p>
      </abstract>
      <trans-abstract xml:lang="en">
        <p>The relevance of the study is due to the need to increase the efficiency of agent training under conditions of partial observability and limited interaction, which are typical for many real-world tasks in multiagent systems. In this regard, the present article is aimed at the development and analysis of a hybrid approach to agent training that combines the advantages of gradient-based and evolutionary methods. The main method of the study is a modified Advantage Actor-Critic (A2C) algorithm, supplemented with elements of evolutionary learning — crossover and mutation of neural network parameters. This approach allows for a comprehensive consideration of the problem of agent adaptation in conditions of limited observation and cooperative interaction. The article presents the results of experiments in an environment with two cooperative agents tasked with extracting and delivering resources. It is shown that the hybrid training method provides a significant increase in the effectiveness of agent behavior compared to purely gradient-based approaches. The dynamics of the average reward confirm the stability of the method and its potential for more complex multiagent interaction scenarios. The materials of the article have practical value for specialists in the fields of reinforcement learning, multi-agent system development, and the design of adaptive cooperative strategies under limited information.</p>
      </trans-abstract>
      <kwd-group xml:lang="ru">
        <kwd>обучение с подкреплением</kwd>
        <kwd>эволюционные алгоритмы</kwd>
        <kwd>многоагентная система</kwd>
        <kwd>A2C</kwd>
        <kwd>LSTM</kwd>
        <kwd>кооперативное обучение</kwd>
      </kwd-group>
      <kwd-group xml:lang="en">
        <kwd>reinforcement learning</kwd>
        <kwd>evolutionary algorithms</kwd>
        <kwd>multiagent system</kwd>
        <kwd>A2C</kwd>
        <kwd>LSTM</kwd>
        <kwd>cooperative learning</kwd>
      </kwd-group>
      <funding-group>
        <funding-statement xml:lang="ru">Исследование выполнено без спонсорской поддержки.</funding-statement>
        <funding-statement xml:lang="en">The study was performed without external funding.</funding-statement>
      </funding-group>
    </article-meta>
  </front>
  <back>
    <ref-list>
      <title>References</title>
      <ref id="cit1">
        <label>1</label>
        <mixed-citation xml:lang="ru">Yadav A., Kumar A., Choudhary Ch. Integrated Swarm Intelligence Framework for Dynamic Traffic Optimization in Delhi: A Three-Layer PSO-Fuzzy-MAS Approach. International Scientific Journal of Engineering and Management. 2025;04(05). https://doi.org/10.55041/ISJEM03921</mixed-citation>
      </ref>
      <ref id="cit2">
        <label>2</label>
        <mixed-citation xml:lang="ru">Icarte-Ahumada G., He Zh., Godoy V., García F., Oyarzún M. A Multi-Agent System for Parking Allocation: An Approach to Allocate Parking Spaces. Electronics. 2025;14(5). https://doi.org/10.3390/electronics14050840</mixed-citation>
      </ref>
      <ref id="cit3">
        <label>3</label>
        <mixed-citation xml:lang="ru">Dey S., Munsi A., Pradhan S., Aditya K. Bidirectional Wireless System for Drone to Drone Opportunity Charging in a Multi Agent System. In: 2023 International Conference on Control, Communication and Computing (ICCC), 19–21 May 2023, Thiruvananthapuram, India. IEEE; 2023. P. 1–5. https://doi.org/10.1109/ICCC57789.2023.10164995</mixed-citation>
      </ref>
      <ref id="cit4">
        <label>4</label>
        <mixed-citation xml:lang="ru">Souli N., Kolios P., Ellinas G. Multi-Agent System for Rogue Drone Interception. IEEE Robotics and Automation Letters. 2023;8(4):2221–2228. https://doi.org/10.1109/LRA.2023.3245412</mixed-citation>
      </ref>
      <ref id="cit5">
        <label>5</label>
        <mixed-citation xml:lang="ru">Sanghi N. Deep Q-Learning (DQN). In: Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models. Berkeley: Apress; 2024. P. 225–271. https://doi.org/10.1007/979-8-8688-0273-7_6</mixed-citation>
      </ref>
      <ref id="cit6">
        <label>6</label>
        <mixed-citation xml:lang="ru">Jeungthanasirigool W., Sirimaskasem Th., Boonraksa T., Boonraksa P. Comparison of PPO-DRL and A2C-DRL Algorithms for MPPT in Photovoltaic Systems via Buck-Boost Converter. International Journal of Innovative Research and Scientific Studies. 2025;8(3):2438–2453. https://doi.org/10.53894/ijirss.v8i3.7022</mixed-citation>
      </ref>
      <ref id="cit7">
        <label>7</label>
        <mixed-citation xml:lang="ru">Вel Rio A., Jimenez D., Serrano J. Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments. IEEE Access. 2024;12:146795–146806. https://doi.org/10.1109/ACCESS.2024.3472473</mixed-citation>
      </ref>
      <ref id="cit8">
        <label>8</label>
        <mixed-citation xml:lang="ru">Chen T.-Yo., Chen W.-N., Hao J.-K., Wang Ya., Zhang J. Multi-Agent Evolution Strategy with Cooperative and Cumulative Step Adaptation for Black-Box Distributed Optimization. IEEE Transactions on Evolutionary Computation. 2025. https://doi.org/10.1109/TEVC.2025.3525713</mixed-citation>
      </ref>
      <ref id="cit9">
        <label>9</label>
        <mixed-citation xml:lang="ru">Hochreiter S., Schmidhuber J. Long Short-Term Memory. Neural Computation. 1997;9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735</mixed-citation>
      </ref>
      <ref id="cit10">
        <label>10</label>
        <mixed-citation xml:lang="ru">Kingma D.P., Ba J. Adam: A Method for Stochastic Optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), 07–09 May 2015, San Diego, CA, USA. 2015. URL: https://arxiv.org/abs/1412.6980</mixed-citation>
      </ref>
    </ref-list>
    <fn-group>
      <fn fn-type="conflict">
        <p>The authors declare that there are no conflicts of interest present.</p>
      </fn>
    </fn-group>
  </back>
</article>