Keywords: RAG systems, large language models, national projects, semantic search, automation, national goals, artificial intelligence in public administration
Integration of the RAG system for automation of search links of indicators and activities of national projects
UDC 004.8
DOI: 10.26102/2310-6018/2025.50.3.027
In the context of the increasing complexity of managing national projects aimed at achieving the National Development Goals of the Russian Federation, an urgent task is to automate the analysis of the relationships between the activities planned within these projects and the indicators that reflect the degree of achievement of the objectives set in the project. Traditional methods of manual document processing are characterized by high labor intensity, subjectivity and significant time costs, which necessitates the development of intelligent decision support systems. This article presents an approach to automating the analysis of links and indicators of national projects, which allows for automatic detection and verification of semantic links "event-indicator" in national project documents, significantly increasing the efficiency of analytical work. This approach is based on the use of the Retrieval-Augmented Generation (RAG) system, which combines a locally adapted language model with vector search technologies. The work demonstrates that the integration of the RAG approach with vector search and taking into account the project ontology allows achieving the required accuracy and relevance of the analysis. The system is particularly valuable not only for its ability to generate interpretable justifications for the identified links, but also for its ability to identify key events that affect the achievement of indicators for several national projects at once, including those whose impact on the implementation of these indicators is not obvious. The proposed solution opens up new opportunities for the digitalization of public administration and can be adapted for other tasks, such as identifying risks in the implementation of events and generating new events.
1. Lewis P., Perez E., Piktus A., et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 06–12 December 2020, Online. 2020. https://doi.org/10.48550/arXiv.2005.11401
2. Mishra A., Vishwakarma S. Analysis of TF-IDF Model and Its Variant for Document Retrieval. In: 2015 International Conference on Computational Intelligence and Communication Networks (CICN), 02–14 December 2015, Jabalpur, India. IEEE; 2015. P. 772–776. https://doi.org/10.1109/CICN.2015.157
3. Mikolov T., Sutskever I., Chen K., Corrado G.S., Dean J. Distributed Representations of Words and Phrases and Their Compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, 05–08 December 2013, Lake Tahoe, NV, USA. 2013. https://doi.org/10.48550/arXiv.1310.4546
4. Ouyang L., Wu J., Jiang X., et al. Training Language Models to Follow Instructions with Human Feedback. In: Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, 28 November – 9 December 2022, New Orleans, LA, USA. 2022. https://doi.org/10.48550/arXiv.2203.02155
5. Guu K., Lee K., Tung Z., Pasupat P., Chang M.-W. REALM: Retrieval-Augmented Language Model Pre-Training. arXiv. URL: https://doi.org/10.48550/arXiv.2002.08909 [Accessed 13th May 2025].
6. Gao Yu., Xiong Yu., Gao X., et al. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv. URL: https://doi.org/10.48550/arXiv.2312.10997 [Accessed 13th May 2025].
7. Brown T.B., Mann B., Ryder N., et al. Language Models are Few-Shot Learners. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 06–12 December 2020, Online. 2020. https://doi.org/10.48550/arXiv.2005.14165
8. Eremeev M., Vorontsov K.V. Lexical Quantile-Based Text Complexity Measure. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2019, 02–04 September 2019, Varna, Bulgaria. INCOMA Ltd.; 2019. P. 270–275. https://doi.org/10.26615/978-954-452-056-4_031
9. Jin R., Du J., Huang W., et al. A Comprehensive Evaluation of Quantization Strategies for Large Language Models. In: Findings of the Association for Computational Linguistics, ACL 2024, 11–16 August 2024, Bangkok, Thailand. Association for Computational Linguistics; 2014. P. 12186–12215.
10. Izacard G., Grave E. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, 19–23 April 2021, Online. Association for Computational Linguistics; 2021. P. 874–880. https://doi.org/10.48550/arXiv.2007.01282
Keywords: RAG systems, large language models, national projects, semantic search, automation, national goals, artificial intelligence in public administration
For citation: Kashirina I.L., Kirillov V.V., Albychev A.S., Starichkova J.V., Magomedov S.G., Chervyakov A.A. Integration of the RAG system for automation of search links of indicators and activities of national projects. Modeling, Optimization and Information Technology. 2025;13(3). URL: https://moitvivt.ru/ru/journal/pdf?id=2001 DOI: 10.26102/2310-6018/2025.50.3.027 (In Russ).
Received 18.06.2025
Revised 11.07.2025
Accepted 29.07.2025