References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2024.44.1.025

1520

Математическая модель универсальной системы управления шагающим роботом на основе методов обучения с подкреплением

Mathematical model of a universal control system for a walking robot based on reinforcement learning methods

Кашко

Василий Васильевич

Kashko

Vasily Vasilievich

vasya.kashko@mail.ru aff-1

0000-0002-0333-2313

Олейникова

Светлана Александровна

Oleinikova

Svetlana Alexandrovna

s.a.oleynikova@gmail.com aff-2

Воронежский государственный технический университет Voronezh State Technical University

01 01 2026

1 1

10.26102/2310-6018/2024.44.1.025

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

Современные подходы к решению задачи управления шагающими роботами с вращательными звеньями представляют собой разрозненные алгоритмы, строящиеся либо на готовой локомоторной программе с дальнейшей ее адаптацией, либо на сложных кинематико-динамических моделях, нуждающихся в обширных знаниях о динамике системы и окружающей среды, что в прикладных задачах зачастую является невыполнимым. Так же, используемые подходы жестко связаны с конфигурацией шагающего робота, что делает невозможным применение метода в приложениях с иной конфигурацией (другим количеством и типом конечностей). В данной статье предлагается универсальный подход к управлению движением шагающих роботов, основанный на методологии обучения с подкреплением. Рассматривается математическая модель системы управления, основанная на конечных дискретных марковских процессах в контексте методов обучения с подкреплением. Ставится задача построения универсальной и адаптивной системы управления, способной осуществить поиск оптимальной стратегии для реализации локомоторной программы в заранее неизвестной среде, путем непрерывного взаимодействия. К результатам, отличающимся научной новизной, следует отнести математическую модель данной системы, позволяющей описать процесс ее функционирования с помощью марковских цепей. Отличием от существующих аналогов является унификация описания робота.

Modern approaches to solving the problem of controlling walking robots with rotary links are disparate algorithms built either on a ready-made locomotor program with its further adaptation or on complex kinematic-dynamic models that require extensive knowledge about the dynamics of the system and the environment, which is often unfeasible in applied problems. Also, the approaches used are strictly related to the configuration of the walking robot, which makes it impossible to use the method in applications with a different configuration (a different number and type of limbs). This article proposes a universal approach to controlling the motion of walking robots based on reinforcement learning methodology. A mathematical model of a control system based on finite discrete Markov processes in the context of reinforcement learning methods is considered. The task is set to build a universal and adaptive control system capable of searching for the optimal strategy for implementing a locomotor program in a previously unknown environment through continuous interaction. The results distinguished by scientific novelty include a mathematical model of this system, which makes it possible to describe the process of its functioning using Markov chains. The difference from existing analogues is the unification of the description of the robot.

система управления обучение с подкреплением марковские процессы принятия решений нейронные сети шагающий робот искусственный интеллект

control system reinforcement learning Markov decision processes neural networks walking robot artificial intelligence

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

Paulo J., Asdadi A., Peixoto P., Amorim P. Human gait pattern changes detection system: A multimodal vision-based and novelty detection learning approach. Biocybernetics and Biomedical Engineering. 2017;37(4):701–717.

Shimmyo S., Sato T., Ohnishi K. Biped walking pattern generation by using preview control based on three-mass model. IEEE transactions on industrial electronics. 2012;60(11):5137–5147. DOI: 10.1109/TIE.2012.2221111.

Smith L., Kew J., Li T., Luu L., Peng X., Ha S., Tan J., Levine S. Learning and Adapting Agile Locomotion Skills by Transferring Experience. Robotics: Science and Systems XIX. 2023. DOI: 10.15607/RSS.2023.XIX.051 (accessed on 11.02.2024).

Braun D. J., Mitchell J. E., Goldfarb M. Actuated dynamic walking in a seven-link biped robot. IEEE/ASME Transactions on Mechatronics. 2010;17(1):147–156. DOI: 10.1109/TMECH.2010.2090891.

Bebek O., Erbatur K. A gait adaptation scheme for biped walking robots. The 8th IEEE International Workshop on Advanced Motion Control. 2004;409–414. DOI: 10.1109/AMC.2004.1297904.

Arakawa T., Fukuda T. Natural motion trajectory generation of biped locomotion robot using genetic algorithm through energy optimization. 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No.96CH35929). 1996;2:1495–1500. DOI: 10.1109/ICSMC.1996.571368.

Luu T.P., Lim H.B., Hoon K.H., Qu X., Low K. H. Subject-specific gait parameters prediction for robotic gait rehabilitation via generalized regression neural network. 2011 IEEE International Conference on Robotics and Biomimetics. 2011;914–919. DOI: 10.1109/ROBIO.2011.6181404.

Ouyang W., Chi H., Pang J., Liang W., Ren Q. Adaptive Locomotion Control of a Hexapod Robot via Bio-Inspired Learning. Front Neurorobot. 2021;15:627157. DOI: 10.3389/fnbot.2021.627157.

Hrdlicka I., Kutilek P. Reinforcement learning in control systems for walking hexapod robots. Cybernetic Letters. 2005;3:1–13.

Fu H., Tang K., Li P., Zhang W., Wang X., Deng G., Wang T., Chen C. Deep Reinforcement Learning for Multi-contact Motion Planning of Hexapod Robots. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. 2021:2381–2388. DOI: 10.24963/ijcai.2021/328.

Geng T., Porr B., Wörgötter F. Fast biped walking with a sensor-driven neuronal controller and real-time online learning. The International Journal of Robotics Research. 2006;25(3):243–259.

Schilling M., Konen K., Ohl F.W., Korthals T. Decentralized Deep Reinforcement Learning for a Distributed and Adaptive Locomotion Controller of a Hexapod Robot. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA; 2020. p. 5335–5342. DOI: 10.1109/IROS45743.2020.9341754.

Tien Y., Yang C., Hooman S. Reinforcement learning and convolutional neural network system for firefighting rescue robot. MATEC Web of Conferences. 2018;161. DOI:

10.1051/matecconf/201816103028.

Саттон Р. С., Барто Э. Дж. Обучение с подкреплением: Введение. 2-е изд.: Пер. с англ. М.: ДМК Пресс; 2020. 552 с.

The authors declare that there are no conflicts of interest present.