<?xml version="1.0" encoding="UTF-8"?>
<article article-type="research-article" dtd-version="1.3" xml:lang="ru" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://metafora.rcsi.science/xsd_files/journal3.xsd">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">moitvivt</journal-id>
      <journal-title-group>
        <journal-title xml:lang="ru">Моделирование, оптимизация и информационные технологии</journal-title>
        <trans-title-group xml:lang="en">
          <trans-title>Modeling, Optimization and Information Technology</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2310-6018</issn>
      <publisher>
        <publisher-name>Издательство</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.26102/2310-6018/2025.48.1.030</article-id>
      <article-id pub-id-type="custom" custom-type="elpub">1830</article-id>
      <title-group>
        <article-title xml:lang="ru">Оценка качества полученного результата в задаче генерации исходного кода по изображению</article-title>
        <trans-title-group xml:lang="en">
          <trans-title>Assessing the quality of the result in the problem of source code generation from an image</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name-alternatives>
            <name name-style="eastern" xml:lang="ru">
              <surname>Никитин</surname>
              <given-names>Илья Владимирович</given-names>
            </name>
            <name name-style="western" xml:lang="en">
              <surname>Nikitin</surname>
              <given-names>Ilya Vladimirovich</given-names>
            </name>
          </name-alternatives>
          <email>vic096@yandex.ru</email>
          <xref ref-type="aff">aff-1</xref>
        </contrib>
      </contrib-group>
      <aff-alternatives id="aff-1">
        <aff xml:lang="ru">Российский экономический университет имени Г.В. Плеханова</aff>
        <aff xml:lang="en">Plekhanov Russian University of Economics</aff>
      </aff-alternatives>
      <pub-date pub-type="epub">
        <day>01</day>
        <month>01</month>
        <year>2026</year>
      </pub-date>
      <volume>1</volume>
      <issue>1</issue>
      <elocation-id>10.26102/2310-6018/2025.48.1.030</elocation-id>
      <permissions>
        <copyright-statement>Copyright © Авторы, 2026</copyright-statement>
        <copyright-year>2026</copyright-year>
        <license license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/">
          <license-p>This work is licensed under a Creative Commons Attribution 4.0 International License</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://moitvivt.ru/ru/journal/article?id=1830"/>
      <abstract xml:lang="ru">
        <p>Исследование представляет собой оценку возможности построения системы выполнения функциональных тестов для задачи генерации исходного кода из изображения. Существует много различных метрик для оценки качества предсказанного нейронной сетью текста: от математических, таких как BLEU, Rogue, до таких, которые используют другую модель для оценки, как, например, BERTScore, BLEURT. Однако проблема генерации исходного кода программы состоит в том, что код представляет собой набор инструкций для выполнения определенной задачи. Актуальность состоит в том, что в публикациях, связанных с системой pix2code, отсутствовало упоминание об автоматизированной тестовой среде, которая сможет проверить соответствие полученного кода заданным условиям. В ходе проделанной работы была реализована подсистема, которая в автоматическом режиме может получить информацию о различиях между изображением, основанном на предсказанном коде, и изображении, основанном на эталонном коде. Также результаты работы этой системы сопоставлены с метрикой BLEU. Проведенный эксперимент позволяет сделать вывод о том, что значение BLEU и результаты выполнения тестов не имеют явной зависимости между собой, а значит, функциональные тесты необходимы для дополнительной проверки эффективности работы модели.</p>
      </abstract>
      <trans-abstract xml:lang="en">
        <p>This study is an assessment of the feasibility of building a system for executing functional tests for the task of generating source code from an image. There are many different metrics for assessing the quality of text predicted by a neural network: from mathematical ones, such as BLEU, Rogue, and those that use another model for evaluation, such as BERTScore, BLEURT. However, the problem with generating source code for a program is that the code is a set of instructions to perform a specific task. The relevance is that in publications related to the pix2code system, there was no mention of an automated test environment that can check whether the resulting code meets the specified conditions. In the course of the work done, a subsystem was implemented that can automatically obtain information about the differences between an image based on a predicted code and an image based on a reference code. Also, the results of this system are compared with the BLEU metric. As a result of the experiment, we can conclude that the BLEU value and the execution of tests do not have an obvious relationship between them, which means that tests are necessary for additional checks of the efficiency of the model.</p>
      </trans-abstract>
      <kwd-group xml:lang="ru">
        <kwd>кодогенерация</kwd>
        <kwd>изображение</kwd>
        <kwd>машинное обучение</kwd>
        <kwd>BLEU</kwd>
        <kwd>функциональные тесты</kwd>
      </kwd-group>
      <kwd-group xml:lang="en">
        <kwd>code generation</kwd>
        <kwd>image</kwd>
        <kwd>machine learning</kwd>
        <kwd>BLEU</kwd>
        <kwd>functional tests</kwd>
      </kwd-group>
      <funding-group>
        <funding-statement xml:lang="ru">Исследование выполнено без спонсорской поддержки.</funding-statement>
        <funding-statement xml:lang="en">The study was performed without external funding.</funding-statement>
      </funding-group>
    </article-meta>
  </front>
  <back>
    <ref-list>
      <title>References</title>
      <ref id="cit1">
        <label>1</label>
        <mixed-citation xml:lang="ru">Никитин И.В. Влияние версии библиотеки TensorFlow на качество генерации кода по изображению. Моделирование, оптимизация и информационные технологии. 2024;12(4). https://doi.org/10.26102/2310-6018/2024.47.4.040</mixed-citation>
      </ref>
      <ref id="cit2">
        <label>2</label>
        <mixed-citation xml:lang="ru">Zou D., Wu G. Automatic Code Generation for Android Applications Based on Improved Pix2code. Journal of Artificial Intelligence and Technology. 2024;4(4):325–331. https://doi.org/10.37965/jait.2024.0515</mixed-citation>
      </ref>
      <ref id="cit3">
        <label>3</label>
        <mixed-citation xml:lang="ru">Beltramelli T. pix2code: Generating Code from a Graphical User Interface Screenshot. In: EICS '18: Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems, 19–22 June 2018, Paris, France. New York: Association for Computing Machinery; 2018. https://doi.org/10.1145/3220134.3220135</mixed-citation>
      </ref>
      <ref id="cit4">
        <label>4</label>
        <mixed-citation xml:lang="ru">Zhu Zh., Xue Zh., Yuan Z. Automatic Graphics Program Generation Using Attention–Based Hierarchical Decoder. In: Computer Vision – ACCV 2018: 14th Asian Conference on Computer Vision: Revised Selected Papers: Part VI, 02–06 December 2018, Perth, Australia. Cham: Springer; 2019. pp. 181–196. https://doi.org/10.1007/978-3-030-20876-9_12</mixed-citation>
      </ref>
      <ref id="cit5">
        <label>5</label>
        <mixed-citation xml:lang="ru">Papineni K., Roukos S., Ward T., Zhu W.-J. BLEU: a Method for Automatic Evaluation of Machine Translation. In: ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 07–12 July 2002, Philadelphia, USA. Stroudsburg: Association for Computational Linguistics; 2002. pp. 311–318.  https://doi.org/10.3115/1073083.1073135</mixed-citation>
      </ref>
      <ref id="cit6">
        <label>6</label>
        <mixed-citation xml:lang="ru">Doddington G. Automatic Evaluation of Machine Translation Quality Using N-gram Co-occurrence Statistics. In: HLT '02: Proceeding of the Second International Conference on Human Language Technology Research, 24–27 March 2002, San Diego, USA. San Francisco: Morgan Kaufmann Publishers Inc.; 2002. pp. 138–145. https://doi.org/10.3115/1289189.1289273</mixed-citation>
      </ref>
      <ref id="cit7">
        <label>7</label>
        <mixed-citation xml:lang="ru">Lin Ch.-Ye. ROGUE: A Package for Automatic Evaluation of Summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, 25–26 July 2004, Barcelona, Spain. Association for Computational Linguistics; 2004. pp. 74–81.</mixed-citation>
      </ref>
      <ref id="cit8">
        <label>8</label>
        <mixed-citation xml:lang="ru">Popović M. chrF++: words helping character n-grams. In: Proceedings of the Second Conference on Machine Translation, 07–08 September 2017, Copenhagen, Denmark. Association for Computational Linguistics; 2017. pp. 612–618. https://doi.org/10.18653/v1/W17-4770</mixed-citation>
      </ref>
      <ref id="cit9">
        <label>9</label>
        <mixed-citation xml:lang="ru">Hendrycks D., Basart S., Kadavath S., et al. Measuring Coding Challenge Competence With APPS. In: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks, 06–14 December 2021, Online. https://doi.org/10.48550/arXiv.2105.09938</mixed-citation>
      </ref>
      <ref id="cit10">
        <label>10</label>
        <mixed-citation xml:lang="ru">Zhang T., Kishore V., Wu F., Weinberger K.Q., Artzi Yo. BERTScore: evaluating Text Generation with BERT. In: 8th International Conference on Learning Representations, ICLR 2020, 26–30 April 2020, Addis Ababa, Ethiopia. 2020. https://doi.org/10.48550/arXiv.1904.09675</mixed-citation>
      </ref>
      <ref id="cit11">
        <label>11</label>
        <mixed-citation xml:lang="ru">Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 02–07 June 2019, Mineapolis, USA. Association for Computational Linguistics; 2019. pp. 4171–4186.  https://doi.org/10.18653/v1/N19-1423</mixed-citation>
      </ref>
      <ref id="cit12">
        <label>12</label>
        <mixed-citation xml:lang="ru">Rei R., Stewart C., Farinha A.C., Lavie A. COMET: A Neural Framework for MT Evaluation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 16–20 November 2020, Online. Association for Computational Linguistics; 2020. pp. 2685–2702. https://doi.org/10.18653/v1/2020.emnlp-main.213</mixed-citation>
      </ref>
      <ref id="cit13">
        <label>13</label>
        <mixed-citation xml:lang="ru">Tran N., Tran H., Nguyen S., Nguyen H., Nguyen T. Does BLEU Score Work for Code Migration? In: 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), 25–26 May 2019, Montreal, USA. IEEE; 2019. pp. 165–176. https://doi.org/10.1109/ICPC.2019.00034</mixed-citation>
      </ref>
      <ref id="cit14">
        <label>14</label>
        <mixed-citation xml:lang="ru">Ren Sh., Guo D., Lu Sh., et al. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv. URL: https://doi.org/10.48550/arXiv.2009.10297 [Accessed 19th February 2025].</mixed-citation>
      </ref>
      <ref id="cit15">
        <label>15</label>
        <mixed-citation xml:lang="ru">Evtikhiev M., Bogomolov E., Sokolov Ya., Bryksin T. Out of the BLEU: How Should We Assess Quality of the Code Generation Models? Journal of Systems and Software. 2023;203. https://doi.org/10.1016/j.jss.2023.111741</mixed-citation>
      </ref>
    </ref-list>
    <fn-group>
      <fn fn-type="conflict">
        <p>The authors declare that there are no conflicts of interest present.</p>
      </fn>
    </fn-group>
  </back>
</article>