References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2026.54.3.013

2220

Гибридная семантическая редукция текстов в библиотечных информационных системах

Hybrid semantic reduction of texts in library information systems

0009-0002-1702-591X

Рзянкин

Илья Сергеевич

Rzyankin

Ilya Sergeevich

i-rzyankin@yandex.ru aff-1

0000-0001-8966-3633

Носков

Михаил Валерианович

Noskov

Mikhail Valerianovich

mnoskov@sfu-kras.ru aff-2

Сибирский федеральный университет Siberian Federal University

01 01 2026

1 1

10.26102/2310-6018/2026.54.3.013

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

Актуальность исследования обусловлена ростом объемов текстовой информации в библиотечных информационных системах и необходимостью обеспечения быстрой и содержательной навигации по электронным фондам в условиях ограниченных вычислительных ресурсов. Существующие решения автоматической суммаризации ориентированы преимущественно на использование крупномасштабных языковых моделей, что затрудняет их внедрение в локальную библиотечную инфраструктуру. В связи с этим работа направлена на разработку ресурсосберегающего метода семантической редукции текста, обеспечивающего баланс между качеством смыслового представления и вычислительной доступностью. Ведущим подходом является гибридная архитектура, основанная на последовательном применении лексической редукции с использованием облаков слов и нейросетевой суммаризации компактными моделями. В исследовании предложена контекстно-ориентированная метрика оценки релевантности, учитывающая семантическую целостность, структурные характеристики и доменно значимые термины библиотечной среды. Экспериментальное исследование на корпусе из 1178 документов показало, что гибридный подход обеспечивает прирост показателей релевантности при одновременном сокращении времени инференса по сравнению с прямой нейросетевой суммаризацией полного текста. Полученные результаты подтверждают возможность практического внедрения предложенного метода в библиотечных информационных системах с ограниченной вычислительной инфраструктурой и его применимость для задач навигации и каталогизации.

The relevance of the study is determined by the continuous growth of textual information in library information systems and the need to ensure fast and meaningful navigation across electronic collections under constrained computational resources. Existing automatic summarization solutions are primarily oriented toward large-scale language models, which limits their practical deployment within local library infrastructures. In this context, the paper aims to develop a resource-efficient method of semantic text reduction that balances the quality of semantic representation with computational feasibility. The proposed approach is based on a hybrid architecture that sequentially combines lexical reduction using word clouds with neural summarization performed by compact models. In addition, a context-oriented evaluation metric is introduced to assess relevance with regard to semantic coherence, structural characteristics, and domain-specific terms significant for the library environment. An experimental study conducted on a corpus of 1178 documents demonstrates that the hybrid approach improves relevance indicators while simultaneously reducing inference time compared to direct neural summarization of the full text. The obtained results confirm the practical applicability of the proposed method for library information systems operating under limited computational infrastructure and its usefulness for navigation and cataloging tasks.

семантическая редукция текста автоматическая суммаризация облако слов библиотечные информационные системы гибридные методы обработки текста нейросетевые модели оценка релевантности Library Relevance Score

semantic text reduction automatic summarization word cloud library information systems hybrid text processing methods neural models relevance evaluation Library Relevance Score

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

Lyon L. The Informatics Transform: Re-Engineering Libraries for the Data Decade. International Journal of Digital Curation. 2012;7(1):126–138. https://doi.org/10.2218/ijdc.v7i1.220

Roy P. Big data analytics in university libraries on today's librarianship decision-making: A disruptive innovation perspective. IFLA Journal. 2025;51(8). https://doi.org/10.1177/03400352251318753

Mridha M.F., Lima A.A., Nur K., et al. A Survey of Automatic Text Summarization: Progress, Process and Challenges. IEEE Access. 2021;9:156043–156070. https://doi.org/10.1109/ACCESS.2021.3129786

Arnaboldi V., Cho J., Sternberg P.W. Wormicloud: A new text summarization tool based on word clouds to explore the C. elegans literature. Database. 2021;2021. https://doi.org/10.1093/database/baab015

Strubell E., Ganesh A., McCallum A. Energy and Policy Considerations for Deep Learning in NLP. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019: Volume 1: Long Papers, 28 July – 02 August 2019, Florence, Italy. Association for Computational Linguistics; 2019. P. 3645–3650. https://doi.org/10.18653/v1/P19-1355

Treviso M., Lee J.-U., Ji T., et al. Efficient Methods for Natural Language Processing: A Survey. Transactions of the Association for Computational Linguistics. 2023;11:826–860. https://doi.org/10.1162/tacl_a_00577

Syed A.A., Gaol F.L., Matsuo T. A Survey of the State-of-the-Art Models in Neural Abstractive Text Summarization. IEEE Access. 2021;9:13248–13265. https://doi.org/10.1109/ACCESS.2021.3052783

Goodwin T., Savery M., Demner-Fushman D. Flight of the PEGASUS? Comparing Transformers on Few-Shot and Zero-Shot Multi-document Abstractive Summarization. In: Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, 08–13 December 2020, Barcelona, Spain (Online). International Committee on Computational Linguistics; 2020. P. 5640–5646. https://doi.org/10.18653/v1/2020.coling-main.494

Li J. A comparative study of keyword extraction algorithms for English texts. Journal of Intelligent Systems. 2021;30:808–815. https://doi.org/10.1515/jisys-2021-0040

Skeppstedt M., Ahltorp M., Kucher K., Lindström M. From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts. Information Visualization. 2024;23(2). https://doi.org/10.1177/14738716241236188

Bhandari M., Gour P.N., Ashfaq A., Liu P., Neubig G. Re-evaluating Evaluation in Text Summarization. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, 16–20 November 2020, Online. Association for Computational Linguistics; 2020. P. 9347–9359. https://doi.org/10.18653/v1/2020.emnlp-main.751

Hobson S.P., Dorr B.J., Monz Ch., Schwartz R. Task-based evaluation of text summarization using relevance prediction. Information Processing & Management. 2007;43(6):1482–1499. https://doi.org/10.1016/j.ipm.2007.01.002

Ushio A., Liberatore F., Camacho-Collados J. Back to the Basics: A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, 07–11 November 2021, Virtual Event / Punta Cana, Dominican Republic. Association for Computational Linguistics; 2021. P. 8089–8103. https://doi.org/10.18653/v1/2021.emnlp-main.638

Hearst M.A., Pedersen E., Patil L., et al. An Evaluation of Semantically Grouped Word Cloud Designs. IEEE Transactions on Visualization and Computer Graphics. 2020;26(9):2748–2761. https://doi.org/10.1109/TVCG.2019.2904683

Dice D., Kogan A. Optimizing Inference Performance of Transformers on CPUs. arXiv. URL: https://arxiv.org/abs/2102.06621 [Accessed 19th January 2026].

Xu Y., Xu R., Iter D., et al. InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT. In: Findings of the Association for Computational Linguistics: EMNLP 2023, 06–10 December 2023, Singapore. Association for Computational Linguistics; 2023. P. 13879–13892. https://doi.org/10.18653/v1/2023.findings-emnlp.927

Lee J.-U., Puerto H., van Aken B. Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research. arXiv. URL: https://arxiv.org/abs/2306.16900 [Accessed 19th January 2026].

Desai Sh., Xu J., Durrett G. Compressive Summarization with Plausibility and Salience Modeling. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, 16–20 November 2020, Online. Association for Computational Linguistics; 2020. P. 6259–6274. https://doi.org/10.18653/v1/2020.emnlp-main.507

Mei A., Kabir A., Bapat R., et al. Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, 20–25 June 2022, Marseille, France. European Language Resources Association; 2022. P. 313–318.

Liang X., Li J., Wu Sh., et al. An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework Based on Semantic Blocks. In: Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, 12–17 October 2022, Gyeongju, Republic of Korea. International Committee on Computational Linguistics; 2022. P. 6415–6425.

The authors declare that there are no conflicts of interest present.