References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2025.49.2.036

1920

Оценка человеческих поз по видеопотоку

Human pose estimation from video stream

0009-0008-5222-2664

Потенко

Максим Алексеевич

Potenko

Maxim Alexeevich

potenkog@gmail.com aff-1

Московский авиационный институт (национальный исследовательский университет) Moscow Aviation Institute (National Research University)

01 01 2026

1 1

10.26102/2310-6018/2025.49.2.036

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

В статье представлено исследование системы оценки позы человеческого тела, основанной на использовании двух нейронных сетей. Предложенная система позволяет определять пространственное расположение 33 ключевых точек, соответствующих основным сочленениям тела человека (кисти, локти, плечи, стопы и др.), а также строить маску сегментации для точного выделения границ человеческой фигуры на изображении. Первая нейронная сеть реализует функции детектора объектов и базируется на архитектуре Single Shot Detector (SSD) с применением принципов Feature Pyramid Network (FPN). Данный подход обеспечивает эффективное объединение признаков различного уровня абстракции и позволяет обрабатывать входные изображения размерностью 224×224 для последующего определения положения людей на кадре. Особенностью реализации является использование информации из предыдущих кадров, что способствует оптимизации вычислительных ресурсов. Вторая нейронная сеть предназначена для выделения ключевых точек и построения маски сегментации. Она также основана на принципах многомасштабного анализа признаков FPN, что обеспечивает высокую точность локализации ключевых точек и границ объекта. Сеть оперирует изображениями размерностью 256×256, что позволяет достичь необходимой точности определения пространственных координат. Предложенная архитектура характеризуется модульностью и масштабируемостью, позволяя адаптировать систему под различные задачи, требующие разного количества контрольных точек. Результаты исследования имеют широкое практическое применение в таких областях, как компьютерное зрение, анимация, мультипликация, системы безопасности и другие направления, связанные с анализом и обработкой визуальной информации.

The article presents a study of a human body pose estimation system based on the use of two neural networks. The proposed system allows determining the spatial location of 33 key points corresponding to the main joints of the human body (wrists, elbows, shoulders, feet, etc.), as well as constructing a segmentation mask for accurate delineation of human figure boundaries in an image. The first neural network implements object detection functions and is based on the Single Shot Detector (SSD) architecture with the application of Feature Pyramid Network (FPN) principles. This approach ensures the effective combination of features at different levels of abstraction and enables the processing of input images with a resolution of 224×224 for subsequent determination of people's positions in a frame. A distinctive feature of the implementation is the use of information from previous frames, which helps optimize computational resources. The second neural network is designed for key point detection and segmentation mask construction. It is also based on the principles of multi-scale feature analysis using FPN, ensuring high accuracy in localizing key points and object boundaries. The network operates on images with a resolution of 256×256, which allows achieving the necessary precision in determining spatial coordinates. The proposed architecture is characterized by modularity and scalability, enabling the system to be adapted for various tasks requiring different numbers of control points. The research results have broad practical applications in fields such as computer vision, animation, cartoon production, security systems, and other areas related to the analysis and processing of visual information.

нейронные сети сверточные нейронные сети машинное обучение компьютерное зрение оценка позы человека ключевые точки сегментация изображений

neural networks convolutional neural networks machine learning computer vision human pose estimation keypoints image segmentation

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

Andriluka M., Pishchulin L., Gehler P., Schiele B. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 23–28 June 2014, Columbus, OH, USA. IEEE; 2014. P. 3686–3693. https://doi.org/10.1109/CVPR.2014.471

Newell A., Yang K., Deng J. Stacked Hourglass Networks for Human Pose Estimation. In: Computer Vision – ECCV 2016: 14th European Conference: Proceedings: Part VIII, 11–14 October 2016, Amsterdam, The Netherlands. Cham: Springer; 2016. P. 483–499. https://doi.org/10.1007/978-3-319-46484-8_29

Zhao Zh.-Q., Zheng P., Xu Sh.-T., Wu X. Object Detection With Deep Learning: A Review. IEEE Transactions on Neural Networks and Learning Systems. 2019;30(11):3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865

Zhang F., Zhu X., Ye M. Fast Human Pose Estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, Long Beach, CA, USA. IEEE; 2019. P. 3512–3521. https://doi.org/10.1109/CVPR.2019.00363

Guo M.-H., Xu T.-X., Liu J.-J., et al. Attention Mechanisms in Computer Vision: A Survey. Computational Visual Media. 2022;8(3):331–368. https://doi.org/10.1007/s41095-022-0271-y

Liu W., Anguelov D., Erhan D., et al. SSD: Single Shot MultiBox Detector. In: Computer Vision – ECCV 2016: 14th European Conference: Proceedings: Part I, 11–14 October 2016, Amsterdam, The Netherlands. Cham: Springer; 2016. P. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

Lin T.-Yi, Maire M., Belongie S., et al. Microsoft COCO: Common Objects in Context. In: Computer Vision – ECCV 2014: 13th European Conference: Proceedings: Part V, 06–12 September 2014, Zurich, Switzerland. Cham: Springer; 2014. P. 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

Lin T.-Yi, Dollár P., Girshick R., He K., Hariharan B., Belongie S. Feature Pyramid Networks for Object Detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, Honolulu, HI, USA. IEEE; 2017. P. 936–944. https://doi.org/10.1109/CVPR.2017.106

He K., Zhang X., Ren Sh., Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, Las Vegas, NV, USA. IEEE; 2016. P. 770–778. https://doi.org/10.1109/CVPR.2016.90

Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv. URL: https://doi.org/10.48550/arXiv.1409.1556 [Accessed 25th March 2025].

Neubeck A., Van Gool L. Efficient Non-Maximum Suppression. In: 18th International Conference on Pattern Recognition (ICPR'06), 20–24 August 2006, Hong Kong, China. IEEE; 2006. P. 850–855. https://doi.org/10.1109/ICPR.2006.479

Kingma D.P., Ba J. Adam: A Method for Stochastic Optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, 07–09 May 2015, San Diego, CA, USA. 2015. https://doi.org/10.48550/arXiv.1412.6980

Charbonnier P., Blanc-Féraud L., Aubert G., Barlaud M. Two Deterministic Half-Quadratic Regularization Algorithms for Computed Imaging. In: Proceedings of 1st International Conference on Image Processing, 13–16 November 1994, Austin, TX, USA. IEEE; 1994. P. 168–172. https://doi.org/10.1109/ICIP.1994.413553

Goodfellow I., Bengio Yo., Courville A. Deep Learning. Cambridge: MIT Press; 2016. 800 p.

Потенко М.А. Применение синтетических данных в обучении нейронных сетей для оценки поз человека. В сборнике: Экспериментальные и теоретические исследования в современной науке: сборник статей по материалам CVIII международной научно-практической конференции, 25 декабря 2024 года, Новосибирск, Россия. Новосибирск: Сибирская академическая книга; 2024. С. 11–17.

The authors declare that there are no conflicts of interest present.