<?xml version="1.0" encoding="UTF-8"?>
<article article-type="research-article" dtd-version="1.3" xml:lang="ru" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://metafora.rcsi.science/xsd_files/journal3.xsd">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">moitvivt</journal-id>
      <journal-title-group>
        <journal-title xml:lang="ru">Моделирование, оптимизация и информационные технологии</journal-title>
        <trans-title-group xml:lang="en">
          <trans-title>Modeling, Optimization and Information Technology</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2310-6018</issn>
      <publisher>
        <publisher-name>Издательство</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.26102/2310-6018/2024.44.1.012</article-id>
      <article-id pub-id-type="custom" custom-type="elpub">1510</article-id>
      <title-group>
        <article-title xml:lang="ru">Идентификация автора текста для открытого множества кандидатов в контексте кибербезопасности</article-title>
        <trans-title-group xml:lang="en">
          <trans-title>Text authorship identification for open set of candidates in cybersecurity context</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes">
          <contrib-id contrib-id-type="orcid">0000-0002-2587-2222</contrib-id>
          <name-alternatives>
            <name name-style="eastern" xml:lang="ru">
              <surname>Романов</surname>
              <given-names>Александр Сергеевич</given-names>
            </name>
            <name name-style="western" xml:lang="en">
              <surname>Romanov</surname>
              <given-names>Aleksandr Sergeevich</given-names>
            </name>
          </name-alternatives>
          <email>alexx.romanov@gmail.com</email>
          <xref ref-type="aff">aff-1</xref>
        </contrib>
      </contrib-group>
      <aff-alternatives id="aff-1">
        <aff xml:lang="ru">Томский государственный университет систем управления и радиоэлектроники</aff>
        <aff xml:lang="en">Tomsk State University of Control Systems and Radioelectronics</aff>
      </aff-alternatives>
      <pub-date pub-type="epub">
        <day>01</day>
        <month>01</month>
        <year>2026</year>
      </pub-date>
      <volume>1</volume>
      <issue>1</issue>
      <elocation-id>10.26102/2310-6018/2024.44.1.012</elocation-id>
      <permissions>
        <copyright-statement>Copyright © Авторы, 2026</copyright-statement>
        <copyright-year>2026</copyright-year>
        <license license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/">
          <license-p>This work is licensed under a Creative Commons Attribution 4.0 International License</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://moitvivt.ru/ru/journal/article?id=1510"/>
      <abstract xml:lang="ru">
        <p>В работе рассмотрены методы определения авторства любительских сочинений по мотивам популярных произведений литературы и кинематографа. Данные для проведения исследования включают тексты 5 самых популярных тематик онлайн-библиотеки Ficbook. Наиболее распространенной является задача атрибуции с закрытым набором. Относительно практических задач можно предполагать, что не всегда истинный автор анонимного текста будет присутствовать в списке кандидатов. Поэтому процесс определения автора рассматривался как усложненная модификация классической задачи классификации – приведению к виду открытого множества авторов. Предложенные методы основаны на авторской комбинации fastText и One-Class SVM с отбором информативных признаков и статистических оценках мер сходства векторных представлений. Статистические методы оказались наименее эффективны даже для простого, кросс-тематического, случая, в котором данные методы уступают в точности одноклассовому SVM до 15 %. Для той же кросс-тематической задачи средняя точность авторской методики на основе совместного применения fastText и One-Class SVM составляет 85 %. В сложном случае внутри тематической классификации авторов точность представленной методики варьируется от 75 до 78 % в зависимости от тематической группы.</p>
      </abstract>
      <trans-abstract xml:lang="en">
        <p>The paper considers the methods of authorship identification for fanfiction texts based on popular works of literature and cinema. The data for the study include texts from 5 popular topics of Ficbook online library. The most common is the closed set attribution task. Regarding practical issues, it can be assumed that the true author of an anonymous text will not always be included in the candidates set. Therefore, the process of author identification was regarded as a more complex version of the typical classification problem – the open set of authors. The proposed methods are based on the machine learning methods: fastText and One-Class SVM with informative features selection and statistical approaches of vector representation similarity measures. Statistical methods have proven to be the least effective even for the simple cross-thematic case. In comparison with the method based on One-Class SVM, the difference in accuracy reaches 15 %. For cross-thematic attribution, the average accuracy of the method based on the combination of One-Class SVM with feature selection and fastText was 85 %, while for the more complex task – classification within a group – it ranged from 75 to 78 % depending on the thematic group.</p>
      </trans-abstract>
      <kwd-group xml:lang="ru">
        <kwd>определение автора текста</kwd>
        <kwd>fastText</kwd>
        <kwd>машинное обучение</kwd>
        <kwd>анализ текста</kwd>
        <kwd>информационная безопасность</kwd>
      </kwd-group>
      <kwd-group xml:lang="en">
        <kwd>text authorship attribution</kwd>
        <kwd>fastText</kwd>
        <kwd>machine learning</kwd>
        <kwd>text analysis</kwd>
        <kwd>information security</kwd>
      </kwd-group>
      <funding-group>
        <funding-statement xml:lang="ru">Данная работа выполнена при финансовой поддержке Министерства науки и высшего образования РФ в рамках базовой части государственного задания ТУСУРа на 2023–2025 гг. (проект № FEWM-2023-0015).</funding-statement>
        <funding-statement xml:lang="en">This research was funded by the Ministry of Science and Higher Education of the Russian Federation as basic part of the state assignment of TUSUR for 2023–2025 (project No. FEWM-2023-0015).</funding-statement>
      </funding-group>
    </article-meta>
  </front>
  <back>
    <ref-list>
      <title>References</title>
      <ref id="cit1">
        <label>1</label>
        <mixed-citation xml:lang="ru">Romanov A., Kurtukova A., Shelupanov A., Fedotova A., Goncharov V. Authorship identification of a Russian-language text using support vector machine and deep neural networks. Future Internet. 2020;13(1):3. DOI: 10.3390/fi13010003.</mixed-citation>
      </ref>
      <ref id="cit2">
        <label>2</label>
        <mixed-citation xml:lang="ru">Romanov A., Kurtukova A., Sobolev A., Shelupanov A., Fedotova A. Determining the age of the author of the text based on deep neural network models. Information. 2020;12(11):589. DOI: 10.3390/info11120589.</mixed-citation>
      </ref>
      <ref id="cit3">
        <label>3</label>
        <mixed-citation xml:lang="ru">Jafariakinabad F., Kien A. H. Unifying lexical, syntactic, and structural representations of written language for authorship attribution. SN Computer Science. 2021;6(2):481. DOI: 10.1007/s42979-021-00911-2.</mixed-citation>
      </ref>
      <ref id="cit4">
        <label>4</label>
        <mixed-citation xml:lang="ru">Mahor U., Aarti K. A Comparative Study of Stylometric Characteristics in Authorship Attribution. Information and Communication Technology for Competitive Strategies (ICTCS 2021) ICT: Applications and Social Interfaces. Singapore, Springer Nature Singapore. 2022. p. 71–81. DOI: 10.1007/978-981-19-0095-2.</mixed-citation>
      </ref>
      <ref id="cit5">
        <label>5</label>
        <mixed-citation xml:lang="ru">Fedotova A., Romanov A., Kurtukova A., Shelupanov A. Authorship attribution of social media and literary Russian-language texts using machine learning methods and feature selection. Future Internet. 2021;14(1):4. DOI: 10.3390/fi14010004.</mixed-citation>
      </ref>
      <ref id="cit6">
        <label>6</label>
        <mixed-citation xml:lang="ru">PAN: series of scientific events and shared tasks on digital text forensics and stylometry. URL: https://pan.webis.de (дата обращения: 19.01.2024).</mixed-citation>
      </ref>
      <ref id="cit7">
        <label>7</label>
        <mixed-citation xml:lang="ru">The 100 Idiolectic Project. URL: https://fold.aston.ac.uk/handle/123456789/17 (дата обращения: 19.01.2024).</mixed-citation>
      </ref>
      <ref id="cit8">
        <label>8</label>
        <mixed-citation xml:lang="ru">Najafi M., Tavan E. Text-to-text transformer in authorship verification via stylistic and semantical analysis. Proceedings of the CLEF. 2022. URL: https://ceur-ws.org/Vol-3180/paper-215.pdf (дата обращения: 19.01.2024).</mixed-citation>
      </ref>
      <ref id="cit9">
        <label>9</label>
        <mixed-citation xml:lang="ru">Drozdova A., Petrov V. Modern сlassic in the web environment: narrative variations of V. Nabokov’s in fanfiction. Acta Universitatis Sapientiae, Film and Media Studies. 2020;18(1):89–107. DOI: 10.2478/ausfm-2020-0005.</mixed-citation>
      </ref>
      <ref id="cit10">
        <label>10</label>
        <mixed-citation xml:lang="ru">Shafirova L., Cassany D., Bach C. Transcultural literacies in online collaboration: a case study of fanfiction translation from Russian into English. Language and Intercultural Communication. 2020;20(6):531–545. DOI: 10.1080/14708477.2020.1812621.</mixed-citation>
      </ref>
      <ref id="cit11">
        <label>11</label>
        <mixed-citation xml:lang="ru">Swain S., Mishra G., Sindhu C. Recent approaches on authorship attribution techniques –an overview. In: 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA). IEEE. Coimbatore, India. 2017. p. 557–566. DOI: 10.1109/iceca.2017.8203599.</mixed-citation>
      </ref>
      <ref id="cit12">
        <label>12</label>
        <mixed-citation xml:lang="ru">Hedegaard S., Simonsen J.G. Lost in translation: Authorship attribution using frame semantics. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011. p. 65–70. URL: https://aclanthology.org/P11-2012.pdf (дата обращения: 19.01.2024).</mixed-citation>
      </ref>
      <ref id="cit13">
        <label>13</label>
        <mixed-citation xml:lang="ru">Соколова Т.П. Проблемы экспертной идентификации в судебном автороведении. Вестник Университета имени О.Е. Кутафина (МГЮА). 2022;2(90):67–76.</mixed-citation>
      </ref>
      <ref id="cit14">
        <label>14</label>
        <mixed-citation xml:lang="ru">Ficbook: Fanfiction book. URL: https://ficbook.net/ (дата обращения 19.01.2024).</mixed-citation>
      </ref>
      <ref id="cit15">
        <label>15</label>
        <mixed-citation xml:lang="ru">Романов А.С. Методы отбора признаков в задаче определения авторства в контексте кибербезопасности. Моделирование, оптимизация и информационные технологии. 2024;12(1). URL: https://moitvivt.ru/ru/journal/pdf?id=1489. DOI: 10.26102/2310-6018/2024.44.1.001.</mixed-citation>
      </ref>
      <ref id="cit16">
        <label>16</label>
        <mixed-citation xml:lang="ru">Mohammed A.A., Umaashankar V. Effectiveness of hierarchical softmax in large scale classification tasks. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE. 2018. p. 1090–1094. DOI: 10.1109/ICACCI.2018.8554637.</mixed-citation>
      </ref>
      <ref id="cit17">
        <label>17</label>
        <mixed-citation xml:lang="ru">Lei K., Fu Q., Yang M., Liang Y. Tag recommendation by text classification with attention-based capsule network. Neurocomputing. 2020;391:65–73. DOI: 10.1016/j.neucom.2020.01.091.</mixed-citation>
      </ref>
      <ref id="cit18">
        <label>18</label>
        <mixed-citation xml:lang="ru">Suwanda R., Syahputra Z., Zamzami E.M. Analysis of Euclidean distance and Manhattan distance in the K-means algorithm for variations number of centroid K. Journal of Physics: Conference Series, IOP Publishing. 2020;1566(1):012058. DOI: 10.1088/1742-6596/1566/1/012058.</mixed-citation>
      </ref>
      <ref id="cit19">
        <label>19</label>
        <mixed-citation xml:lang="ru">Martín-del-Campo-Rodríguez C., Sidorov G., Batyrshin I. Unsupervised authorship attribution using feature selection and weighted cosine similarity. Journal of Intelligent &amp; Fuzzy Systems. 2022;42(5):4357–4367.</mixed-citation>
      </ref>
      <ref id="cit20">
        <label>20</label>
        <mixed-citation xml:lang="ru">Park K., Hong J.S., Kim W. A methodology combining cosine similarity with classifier for text classification. Applied Artificial Intelligence. 2020;34(5):396–411. DOI: 10.1080/08839514.2020.1723868.</mixed-citation>
      </ref>
    </ref-list>
    <fn-group>
      <fn fn-type="conflict">
        <p>The authors declare that there are no conflicts of interest present.</p>
      </fn>
    </fn-group>
  </back>
</article>