References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2025.49.2.002

1863

Использование архитектур ResNet и Трансформеров в задаче генерации исходного кода на основе изображения

Using ResNet and Transformer architectures in the problem of source code generation from an image

Никитин

Илья Владимирович

Nikitin

Ilya Vladimirovich

vic096@yandex.ru aff-1

Российский экономический университет имени Г.В. Плеханова Plekhanov Russian University of Economics

01 01 2026

1 1

10.26102/2310-6018/2025.49.2.002

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

В статье рассматриваются различные способы оптимизации системы, разработанной для генерации исходного кода на основе изображения. Сама система состоит из двух частей: автоэнкодера для обработки изображений и выделения из них необходимых признаков, и обработки текста с использованием LSTM блоков. В последнее время вышло много новых подходов к решению задач как улучшения показателей обработки изображения, так и обработки и предсказания текста. В рамках данного исследования были выбраны архитектуры ResNet для улучшения части, связанной с обработкой изображения, и архитектура Трансформера для улучшения части, связанной с предсказанием текста. В рамках экспериментов было проведено сравнение показателей систем, состоящих из различных комбинаций архитектурных решений исходной системы, ResNet архитектуры и Трансформеров, сделан вывод о качестве предсказания на основе показателей метрик BLEU, chrF++, а также выполнения функциональных тестов. В ходе проведенных экспериментов был сделан вывод о том, что комбинация архитектур ResNet и Трансформеров показывает наилучший результат в задаче генерации исходного кода на основе изображения, но также эта комбинация требует наибольшего времени для своего обучения.

This study examines different ways to optimize a system designed to generate source code from an image. The system itself consists of two parts: an autoencoder for processing images and extracting the necessary features from them, and text processing using LSTM blocks. Recently, many new approaches have been released to solve problems of both improving image processing performance and text processing and prediction. In this study, ResNet architectures were chosen to improve the image processing part and Transformer architecture to improve the text prediction part. As part of the experiments, a comparison was made of the performance of systems consisting of various combinations of architectural solutions of the original system, ResNet architecture and transformers, and a conclusion was made about the quality of prediction based on the performance of the BLEU, chrF++ metrics, as well as the execution of functional tests. The experiments showed that the combination of ResNet and Transformer architectures shows the best result in the task of generating source code from an image, but this combination also requires the longest time for its training.

кодогенерация изображение машинное обучение ResNet Трансформеры

code generation image machine learning ResNet transformers

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

Beltramelli T. pix2code: Generating Code from a Graphical User Interface Screenshot. In: EICS '18: Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems, 19–22 June 2018, Paris, France. New York: Association for Computing Machinery; 2018. https://doi.org/10.1145/3220134.3220135

Zhu Zh., Xue Zh., Yuan Z. Automatic Graphics Program Generation Using Attention-Based Hierarchical Decoder. In: Computer Vision – ACCV 2018: 14th Asian Conference on Computer Vision: Revised Selected Papers: Part VI, 02–06 December 2018, Perth, Australia. Cham: Springer; 2019. pp. 181–196. https://doi.org/10.1007/978-3-030-20876-9_12

Liu Ya., Hu Q., Shu K. Improving pix2code Based BI-directional LSTM. In: 2018 IEEE International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), 16–18 November 2018, Shenyang, China. IEEE; 2019. pp. 220–223. https://doi.org/10.1109/AUTEEE.2018.8720784

Zou D., Wu G. Automatic Code Generation for Android Applications Based on Improved Pix2code. Journal of Artificial Intelligence and Technology. 2024;4(4):325–331. https://doi.org/10.37965/jait.2024.0515

Никитин И.В. Влияние версии библиотеки TensorFlow на качество генерации кода по изображению. Моделирование, оптимизация и информационные технологии. 2024;12(4). https://doi.org/10.26102/2310-6018/2024.47.4.040

Никитин И.В. Оценка качества полученного результата в задаче генерации исходного кода по изображению. Моделирование, оптимизация и информационные технологии. 2025;13(1). https://doi.org/10.26102/2310-6018/2025.48.1.030

He K., Zhang X., Ren Sh., Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, Las Vegas, USA. IEEE; 2016 pp. 770–778. https://doi.org/10.1109/CVPR.2016.90

Balduzzi D., Frean M., Leary L., Lewis J.P., Wan-Duo Ma K., McWilliams B. The Shattered Gradients Problem: If resnets are the answer, then what is the question? In: ICML'17: Proceedings of the 34th International Conference on Machine Learning, 06–11 August 2017, Sydney, Australia. 2017. pp. 342–350. https://doi.org/10.48550/arXiv.1702.08591

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I. Attention Is All You Need. In: NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 04–09 December 2017, Long Beach, USA. New York: Curran Associates Inc.; 2017. pp. 6000–6010.

Chen W.-Y., Podstreleny P., Cheng W.-H., Chen Y.-Y., Hua K.-L. Code Generation From a Graphical User Interface Via Attention-Based Encoder-Decoder Model. Multimedia Systems. 2022;28(1):121–130. https://doi.org/10.1007/s00530-021-00804-7

Popović M. chrF++: Words Helping Character N-grams. In: Proceedings of the Second Conference on Machine Translation, 07–08 September 2017, Copenhagen, Denmark. Association for Computational Linguistics; 2017. pp. 612–618. https://doi.org/10.18653/v1/W17-4770

The authors declare that there are no conflicts of interest present.