<?xml version="1.0" encoding="UTF-8"?>
<article article-type="research-article" dtd-version="1.3" xml:lang="ru" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://metafora.rcsi.science/xsd_files/journal3.xsd">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">moitvivt</journal-id>
      <journal-title-group>
        <journal-title xml:lang="ru">Моделирование, оптимизация и информационные технологии</journal-title>
        <trans-title-group xml:lang="en">
          <trans-title>Modeling, Optimization and Information Technology</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2310-6018</issn>
      <publisher>
        <publisher-name>Издательство</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.26102/2310-6018/2024.47.4.038</article-id>
      <article-id pub-id-type="custom" custom-type="elpub">1763</article-id>
      <title-group>
        <article-title xml:lang="ru">Оценка качества интеллектуального перефразирования текстов на русском языке</article-title>
        <trans-title-group xml:lang="en">
          <trans-title>Evaluation of the quality of intelligent text paraphrasing in Russian</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name-alternatives>
            <name name-style="eastern" xml:lang="ru">
              <surname>Дагаев</surname>
              <given-names>Александр Евгеньевич</given-names>
            </name>
            <name name-style="western" xml:lang="en">
              <surname>Dagaev</surname>
              <given-names>Alexander Evgenevich</given-names>
            </name>
          </name-alternatives>
          <email>a.e.dagaev@staff.mospolytech.ru</email>
          <xref ref-type="aff">aff-1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name-alternatives>
            <name name-style="eastern" xml:lang="ru">
              <surname>Попов</surname>
              <given-names>Дмитрий Иванович</given-names>
            </name>
            <name name-style="western" xml:lang="en">
              <surname>Popov</surname>
              <given-names>Dmitry Ivanovich</given-names>
            </name>
          </name-alternatives>
          <email>damitry.popov@gmail.com</email>
          <xref ref-type="aff">aff-2</xref>
        </contrib>
      </contrib-group>
      <aff-alternatives id="aff-1">
        <aff xml:lang="ru">Московский политехнический университет</aff>
        <aff xml:lang="en">Moscow Polytechnic University</aff>
      </aff-alternatives>
      <aff-alternatives id="aff-2">
        <aff xml:lang="ru">Сочинский государственный университет</aff>
        <aff xml:lang="en">Sochi State University</aff>
      </aff-alternatives>
      <pub-date pub-type="epub">
        <day>01</day>
        <month>01</month>
        <year>2026</year>
      </pub-date>
      <volume>1</volume>
      <issue>1</issue>
      <elocation-id>10.26102/2310-6018/2024.47.4.038</elocation-id>
      <permissions>
        <copyright-statement>Copyright © Авторы, 2026</copyright-statement>
        <copyright-year>2026</copyright-year>
        <license license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/">
          <license-p>This work is licensed under a Creative Commons Attribution 4.0 International License</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://moitvivt.ru/ru/journal/article?id=1763"/>
      <abstract xml:lang="ru">
        <p>Данное исследование посвящено разработке интегральной метрики для оценки качества моделей перефразирования текстов, что отвечает актуальной задаче создания комплексных и объективных методов оценки. В отличие от предыдущих исследований, преимущественно фокусирующихся на англоязычных наборах данных, настоящее исследование акцентирует внимание на наборах данных русского языка, которые до настоящего времени оставались недостаточно изученными. Использование таких датасетов, как Gazeta, XL-Sum и WikiLingua (для русского языка), а также CNN Dailymail и XSum (для английского языка), обеспечивает многоязычную применимость предложенного подхода. Предлагаемая метрика сочетает лексические (ROUGE, BLEU), структурные (ROUGE-L) и семантические (BERTScore, METEOR, BLEURT) критерии оценки с распределением весов, исходя из важности каждой метрики. Результаты демонстрируют превосходство моделей ChatGPT-4 на русскоязычных наборах и GigaChat на англоязычных наборах, тогда как модели Gemini и YouChat показывают ограниченные возможности в достижении семантической точности вне зависимости от языка датасета. Оригинальность исследования заключается в объединении метрик в единую систему, что делает возможным более объективное и комплексное сравнение языковых моделей. Исследование вносит вклад в область обработки естественного языка, предлагая инструмент для оценки качества языковых моделей.</p>
      </abstract>
      <trans-abstract xml:lang="en">
        <p>The study focuses on the development of an integral metric for evaluating the quality of text paraphrasing models, addressing the pressing need for comprehensive and objective evaluation methods. Unlike previous research, which predominantly focuses on English-language datasets, this study emphasizes Russian-language datasets, which have remained underexplored until now. The inclusion of datasets such as Gazeta, XL-Sum, and WikiLingua (for Russian) as well as CNN Dailymail and XSum (for English) ensures the multilingual applicability of the proposed approach. The proposed metric combines lexical (ROUGE, BLEU), structural (ROUGE-L), and semantic (BERTScore, METEOR, BLEURT) evaluation criteria, with weights assigned based on the importance of each metric. The results highlight the superiority of ChatGPT-4 on Russian datasets and GigaChat on English datasets, whereas models such as Gemini and YouChat exhibit limited capabilities in achieving semantic accuracy regardless of the dataset language. The originality of this research lies in the integration of multiple metrics into a unified system, enabling more objective and comprehensive comparisons of language models. The study contributes to the field of natural language processing by providing a tool for assessing the quality of language models.</p>
      </trans-abstract>
      <kwd-group xml:lang="ru">
        <kwd>обработка естественного языка</kwd>
        <kwd>перефразирование текста</kwd>
        <kwd>GigaChat</kwd>
        <kwd>YandexGPT 2</kwd>
        <kwd>ChatGPT-3.5</kwd>
        <kwd>ChatGPT-4</kwd>
        <kwd>Gemini</kwd>
        <kwd>Bing AI</kwd>
        <kwd>YouChat</kwd>
        <kwd>Mistral Large</kwd>
      </kwd-group>
      <kwd-group xml:lang="en">
        <kwd>natural language processing</kwd>
        <kwd>text paraphrasing</kwd>
        <kwd>GigaChat</kwd>
        <kwd>YandexGPT 2</kwd>
        <kwd>ChatGPT-3.5</kwd>
        <kwd>ChatGPT-4</kwd>
        <kwd>Gemini</kwd>
        <kwd>Bing AI</kwd>
        <kwd>YouChat</kwd>
        <kwd>Mistral Large</kwd>
      </kwd-group>
      <funding-group>
        <funding-statement xml:lang="ru">Исследование выполнено без спонсорской поддержки.</funding-statement>
        <funding-statement xml:lang="en">The study was performed without external funding.</funding-statement>
      </funding-group>
    </article-meta>
  </front>
  <back>
    <ref-list>
      <title>References</title>
      <ref id="cit1">
        <label>1</label>
        <mixed-citation xml:lang="ru">Xie J., Agrawal A. Emotion and Sentiment Guided Paraphrasing. In: Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, &amp; Social Media Analysis, 13 July 2023, Toronto, Canada. Association for Computational Linguistics; 2023. pp. 58–70. https://doi.org/10.18653/v1/2023.wassa-1.7</mixed-citation>
      </ref>
      <ref id="cit2">
        <label>2</label>
        <mixed-citation xml:lang="ru">Krishna K., Song Y., Karpinska M., Wieting J., Iyyer M. Paraphrasing Evades Detectors of AI-Generated Text, but Retrieval is an Effective Defense. In: Advances in Neural Information Processing Systems: 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 10–16 December 2023, New Orleans, USA. Curran Associates; 2024. https://doi.org/10.48550/arXiv.2303.13408</mixed-citation>
      </ref>
      <ref id="cit3">
        <label>3</label>
        <mixed-citation xml:lang="ru">Sadasivan V.S., Kumar A., Balasubramanian S., Wang W., Feizi S. Can AI-Generated Text be Reliably Detected? arXiv. URL: https://doi.org/10.48550/arXiv.2303.11156 [Accessed 14th November 2024].</mixed-citation>
      </ref>
      <ref id="cit4">
        <label>4</label>
        <mixed-citation xml:lang="ru">Verma D., Lal Y.K., Sinha S., Van Durme B., Poliak A. Evaluating Paraphrastic Robustness in Textual Entailment Models. arXiv. URL: https://doi.org/10.48550/arXiv.2306.16722 [Accessed 14th November 2024].</mixed-citation>
      </ref>
      <ref id="cit5">
        <label>5</label>
        <mixed-citation xml:lang="ru">Shen L., Liu L., Jiang H., Shi S. On the Evaluation Metrics for Paraphrase Generation. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 07–11 December 2022, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics; 2022. pp. 3178–3190.</mixed-citation>
      </ref>
      <ref id="cit6">
        <label>6</label>
        <mixed-citation xml:lang="ru">Weston J., Lenain R., Meepegama U., Fristed E. Generative Pretraining for Paraphrase Evaluation [Preprint]. arXiv. URL: https://doi.org/10.48550/arXiv.2107.08251 [Accessed 14th November 2024].</mixed-citation>
      </ref>
      <ref id="cit7">
        <label>7</label>
        <mixed-citation xml:lang="ru">Sharma S., Joshi A., Mukhija N., Zhao Y., Bhathena H., Singh P., Santhanam S., Biswas P. Systematic review of effect of data augmentation using paraphrasing on Named entity recognition. In: NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, 28 November – 09 December 2022, New Orleans, USA.</mixed-citation>
      </ref>
      <ref id="cit8">
        <label>8</label>
        <mixed-citation xml:lang="ru">Han T., Li D., Ma X., Hu N. Comparing product quality between translation and paraphrasing: Using NLP-assisted evaluation frameworks. Frontiers in Psychology. 2022;13. https://doi.org/10.3389/fpsyg.2022.1048132</mixed-citation>
      </ref>
      <ref id="cit9">
        <label>9</label>
        <mixed-citation xml:lang="ru">Ahn J., Khosmood F. Evaluation of Automatic Text Summarization using Synthetic Facts. arXiv. URL: https://doi.org/10.48550/arXiv.2204.04869 [Accessed 14th November 2024].</mixed-citation>
      </ref>
      <ref id="cit10">
        <label>10</label>
        <mixed-citation xml:lang="ru">Nicula B., Dascalu M., Newton N., Orcutt E., McNamara D.S. Automated Paraphrase Quality Assessment Using Recurrent Neural Networks and Language Models. In: Intelligent Tutoring Systems: 17th International Conference, ITS 2021: Proceedings, 07–11 June 2021, Online. Cham: Springer; 2021. pp. 333–340. https://doi.org/10.1007/978-3-030-80421-3_36</mixed-citation>
      </ref>
      <ref id="cit11">
        <label>11</label>
        <mixed-citation xml:lang="ru">Gusev I. Dataset for Automatic Summarization of Russian News. In: Artificial Intelligence and Natural Language: 9th Conference, AINL 2020: Proceedings, 07–09 October 2020, Helsinki, Finland. Cham: Springer; 2020. pp. 122–134. https://doi.org/10.1007/978-3-030-59082-6_9</mixed-citation>
      </ref>
      <ref id="cit12">
        <label>12</label>
        <mixed-citation xml:lang="ru">Hasan T., Bhattacharjee A., Islam M.S., Mubasshir K., Li Y.-F., Kang Y.-B., Rahman M.S., Shahriyar R. XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 01–06 August 2021, Online. Association for Computational Linguistics; 2021. pp. 4693–4703. https://doi.org/10.18653/v1/2021.findings-acl.413</mixed-citation>
      </ref>
      <ref id="cit13">
        <label>13</label>
        <mixed-citation xml:lang="ru">Ladhak F., Durmus E., Cardie C., McKeown K. WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization. In: Findings of the Association for Computational Linguistics: EMNLP 2020, 16–20 November 2020, Online. Association for Computational Linguistics; 2020. pp. 4034–4048.  https://doi.org/10.18653/v1/2020.findings-emnlp.360</mixed-citation>
      </ref>
      <ref id="cit14">
        <label>14</label>
        <mixed-citation xml:lang="ru">Nallapati R., Zhou B., Dos Santos C., Gülçehre Ç., Xiang B. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 11–12 August 2016, Berlin, Germany. Berlin: Association for Computational Linguistics; 2016. pp. 280–290. https://doi.org/10.18653/v1/K16-1028</mixed-citation>
      </ref>
      <ref id="cit15">
        <label>15</label>
        <mixed-citation xml:lang="ru">Narayan S., Cohen S.B., Lapata M. Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 31 October – 04 November 2018, Brussels, Belgium. Association for Computational Linguistics; 2018. pp. 1797–1807. https://doi.org/10.18653/v1/D18-1206</mixed-citation>
      </ref>
      <ref id="cit16">
        <label>16</label>
        <mixed-citation xml:lang="ru">Patil O., Singh R., Joshi T. Understanding Metrics for Paraphrasing. arXiv. URL: https://doi.org/10.48550/arXiv.2205.13119 [Accessed 14th November 2024].</mixed-citation>
      </ref>
      <ref id="cit17">
        <label>17</label>
        <mixed-citation xml:lang="ru">Lin C.-Y. ROUGE: A Package for Automatic Evaluation of Summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, 25–26 July 2004, Barcelona, Spain. Association for Computational Linguistics; 2004. pp. 74–81.</mixed-citation>
      </ref>
      <ref id="cit18">
        <label>18</label>
        <mixed-citation xml:lang="ru">Zhang T., Kishore V., Wu F., Weinberger K.Q., Artzi Y. BERTScore: Evaluating Text Generation with BERT. In: Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, 26–30 April 2020, Addis Ababa, Ethiopia. Addis Ababa: International Conference on Learning Representations; 2020. pp. 1–43. https://doi.org/10.48550/arXiv.1904.09675</mixed-citation>
      </ref>
      <ref id="cit19">
        <label>19</label>
        <mixed-citation xml:lang="ru">Banerjee S., Lavie A. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 29 June 2005, Ann Arbor, USA. Association for Computational Linguistics; 2005. pp. 65–72.</mixed-citation>
      </ref>
      <ref id="cit20">
        <label>20</label>
        <mixed-citation xml:lang="ru">Post M. A Call for Clarity in Reporting BLEU Scores. In: Proceedings of the Third Conference on Machine Translation: Research Papers, 31 October – 01 November 2018, Brussels, Belgium. Association for Computational Linguistics; 2018. pp. 186–191. https://doi.org/10.18653/v1/W18-6319</mixed-citation>
      </ref>
      <ref id="cit21">
        <label>21</label>
        <mixed-citation xml:lang="ru">Sellam T., Das D., Parikh A. BLEURT: Learning Robust Metrics for Text Generation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 05–10 July 2020, Online. Association for Computational Linguistics; 2020. pp. 7881–7892. https://doi.org/10.18653/v1/2020.acl-main.704</mixed-citation>
      </ref>
    </ref-list>
    <fn-group>
      <fn fn-type="conflict">
        <p>The authors declare that there are no conflicts of interest present.</p>
      </fn>
    </fn-group>
  </back>
</article>