References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2024.44.1.033

1536

Детектирование машинно-сгенерированных текстов при помощи адаптивной квантильной регрессии

Detecting machine-generated texts with adaptive quantile regression

0009-0000-1313-5826

Тюрин

Алексей Сергеевич

Tyurin

Aleksey Sergeevich

leha2148@gmail.com aff-1

0000-0002-1373-2521

Сараев

Павел Викторович

Saraev

Pavel Viktorovich

psaraev@yandex.ru aff-2

Липецкий государственный технический университет Lipetsk State Technical University

01 01 2026

1 1

10.26102/2310-6018/2024.44.1.033

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

В работе рассматривается задача детектирования машинно-сгенерированных текстов при помощи различных инструментов построения регрессионных моделей – классической линейной регрессии, логистической регрессии и квантильной регрессии. Прогресс в области машинного обучения позволяет создавать все более реалистичные тексты, что открывает возможности для их недобросовестного использования. По мере того, как алгоритмы генерации текстов становятся более сложными, возрастает и сложность задачи детектирования таких текстов, что также требует применения более сложных методов математического моделирования и более эффективных численных методов. Рассматриваемый алгоритм адаптивной квантильной регрессии представляет собой инструмент, который позволяет строить модели с акцентом на различные квантили, что делает его особенно полезным для детектирования нетипичных значений, что может указывать на искусственную природу текстов. Также в работе представлено подробное описание исходного открытого набора данных для обучения моделей, представляющего собой сгенерированные тексты при помощи модели GhatGPT и случайные рукописные тексты c различных форумов, приведен анализ проведенных вычислительных экспериментов. Результаты исследования показывают высокую эффективность предложенного метода в данной прикладной области.

This paper considers the problem of detecting machine-generated texts using various regression model building tools – classical linear regression, logistic regression and quantile regression. Advances in machine learning are creating increasingly realistic texts, which opens the door to misuse. As text generation algorithms become more sophisticated, the complexity of the task of detecting such texts increases, which also requires more sophisticated mathematical modeling methods and more efficient numerical methods. The proposed adaptive quantile regression algorithm is a tool that allows building models with emphasis on different quantiles, which makes it particularly useful for detecting atypical values that may indicate the artificial nature of the texts. The paper also presents a detailed description of the initial open dataset for model training, which is a set of generated texts using the GhatGPT 3 model and random texts from various forums, and analyzes the computational experiments performed. The results show the high efficiency of the proposed method in this field of application.

классификация текстов квантильная регрессия адаптивный алгоритм градиентный спуск математическое моделирование численные методы

text classification quantile regression adaptive algorithm gradient descent mathematical modeling numerical methods

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

He Y., Qiu J., Zhang W., Yuan Z. Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models. URL: http://arxiv.org/abs/2402.01725 (дата обращения: 03.02.2024).

Seo Ji-Hoon, Lee Ho-Sun, Choi Jin-Tak. Classification Technique for Filtering Sentiment Vocabularies for the Enhancement of Accuracy of Opinion Mining. International journal of u- and e-service, science and technology. 2015;8(10):11–20. DOI: 10.14257/ijunesst.2015.8.10.02.

Sandler M., Choung H., Ross A., David P. A Linguistic Comparison between Human and ChatGPT-Generated Conversations. URL: https://arxiv.org/pdf/2401.16587.pdf (дата обращения: 05.02.2024).

Hans A., et al. Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text. URL: https://arxiv.org/pdf/2401.12070.pdf (дата обращения: 04.02.2024).

Zheng Qi, Peng Limin, He Xuming. Globally adaptive quantile regression with ultra-high dimensional data. The Annals of Statistics. 2015;43(5):2225–2258. DOI: 10.1214/15-AOS1340.

Barrodale I., Roberts F.D.K. An Improved Algorithm for Discrete l1 Linear Approximation. SIAM Journal on Numerical Analysis. 1973;10(5):839–848. DOI: 10.1137/0710069.

Chen C. An Adaptive Algorithm for Quantile Regression. В сборнике: Theory and Applications of Recent Robust Methods by ICORS2003: International Conference on Robust Statistics – 2003, 13–18 июля 2003 года, Антверпен, Бельгия. Базель: Springer Basel AG; 2004. C. 39–48.

Chen C. A Finite Smoothing Algorithm for Quantile Regression. Journal of Computational and Graphical Statistics. 2007;16(1):136–164. DOI: 10.1198/106186007X180336.

Тюрин А.С. Адаптивная квантильная регрессия. Моделирование, оптимизация и информационные технологии. 2024;12(1). URL: https://moitvivt.ru/ru/journal/pdf?id=1514. DOI: 10.26102/2310-6018/2024.44.1.016 (дата обращения: 07.02.2024).

Duan T., Avati A., Ding D.Y., Thai K.K., Basu S., Ng A., Schuler A. NGBoost: Natural Gradient Boosting for Probabilistic Prediction. В сборнике: ICML 2020: 37th International Conference on Machine Learning: Proceedings of the 37th International Conference on Machine Learning, 13-18 июля 2020 года, Вена, Австрия. 2020. С. 2690–2700.

Тюрин А.С., Сараев П.В. Построение квантильной регрессии с использованием натурального градиентного спуска. Прикладная математика и вопросы управления. 2023;(2):43–52. DOI: 10.15593/2499-9873/2023.2.04.

The authors declare that there are no conflicts of interest present.