References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

427

ИССЛЕДОВАНИЕ ВОЗМОЖНОСТЕЙ УСКОРЕНИЯ АЛГОРИТМОВ ПАРАЛЛЕЛЬНОЙ СОРТИРОВКИ RADIX НА GPGPU

THE RESEARCH OF POSSIBILITIES OF ACCELERATION OF PARALLEL ALGORITHMS FOR RADIX SORTING ON GPGPU

Воронцов

Глеб Владимирович

Vorontsov

Gleb Vladimirovich

lomovivanvivt@yandex.ru aff-1

Преображенский

Андрей Петрович

Preobrazhensky

Andrei Petrovich

app@vivt.ru aff-2

Чопоров

Олег Николаевич

Choporov

Oleg Nikolaevich

choporov_oleg@mail.ru aff-3

Воронежский институт высоких технологий Voronezh Institute of High Technologies

Воронежский институт высоких технологий Воронежский государственный технический университет Voronezh Institute of High Technologies Voronezh State Technical University

01 01 2026

1 1

e427

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

В данной работе проводится анализ алгоритма параллельной сортировки RADIX на графических процессорах (GPGPU). Вначале рассматривается наивный алгоритм поразрядной сортировки. При этом используются два вида поразрядной сортировки — по младшим и старшим разрядам. Приведен пример их использования. С тем, чтобы увеличить производительность алгоритма поразрядной сортировки предлагается использовать параллельное решение, хотя при этом возникают дополнительные проблемы, требующие своего решения. Анализируются возможные подходы по распараллеливанию, предложенные различными авторами. В рассматриваемом алгоритме данные хранятся в памяти графического процессора, и сортировка выполняется непосредственно на GPU. Этот алгоритм параллельной Radix сортировки состоит из 3-х подсистем: подсчет двоичных комбинаций в текущем разряде, префикс суммирование, окончательное соответствие ключей с вычисленными позициями. Первым шагом алгоритма является процесс подсчета частоты каждого элемента в последовательности. Для осуществления этого параллельным образом происходит разделение входного массива на блоки. Далее вычисляется локальная частота всех возможных элементов для каждого блока. Затем для каждой маски проводится префиксное суммирование. На следующем шаге получаются из локальных списков частот глобальные. Приведены результаты моделирования, продемонстрировавшие увеличение в несколько раз быстродействие предлагаемого алгоритма по сравнению с известными.

This paper analyzes parallel RADIX sorting on GPGPU. First, they consider the naive algorithm of radix sorting. It is indicated that using two types of radix sorting - at Junior and senior level. Given an example of their use. In order to increase the performance of the radix sorting algorithm is proposed to use a parallel solution, although this raises additional issues that require resolution. Analyzes possible approaches for the parallelization proposed by various authors. In the proposed algorithm the data is stored in GPU memory, and sorting is performed directly on the GPU. This algorithm is a parallel Radix sort consists of 3 subsystems: counting binary combinations in the current category, prefix summing, the final key according to the computed positions. The first step of the algorithm is the process of counting the frequency of each element in the sequence. To have it done in a parallel way there is a separation of the input array into blocks. It then computes the local frequency of all possible elements for each block. Next, for each mask is the prefix summation. The next step is to obtain from the local lists of the frequency of the global. Simulation results demonstrated the increase of several times the performance of the proposed algorithm in comparison with the known.

параллельная сортировка алгоритм данные процессор

parallel sorting algorithm data processor

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

Linh Ha Fast 4-way parallel radix sorting on GPUs / Ha Linh, Jens Kruger, Claudio T.Silva [электронный ресурс]: http://www.sci.utah.edu/~csilva/papers/cgf.pdf

Wikipedia, Radix Sort [электронный ресурс]: https://en.wikipedia.org/wiki/Radix_sort.

Batcher K.Sorting Networks and Their Applications / K. Batcher // Proceedings of the AFIPS Spring Joint Computing Conference, vol. 32, 1968, pp.307-314.

Ajtai M. An 0(n log n) sorting network / M.Ajtai, J.Komlós, E.Szemerédi //In: STOC ’83: Proceedings of the fifteenth annual ACM symposium on Theory of computing, 1983, pp. 1– 9.

Leighton T. Tight bounds on the complexity of parallel sorting / T.Leighton // In: STOC ’84: Proceedings of the sixteenth annual ACM symposium on Theory of computing (New York, NY, USA, 1984), ACM, pp. 71–80.

Zagha M. Radix sort for vector multiprocessors / M.Zagha, G. E.Blelloch // In: Supercomputing ’91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing (New York, NY, USA, 1991), pp. 712–721.

Kipfer P. UberFlow: a GPU-based particle engine / P.Kipfer, M.Segal, R.Westermann // In: HWWS ’04: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (New York, NY, USA, 2004), ACM Press, pp. 115– 122.

Sintorn E. Fast parallel GPU-sorting using a hybrid algorithm / E.Sintorn, U.Assarsson //In: Workshop on General Purpose Processing on Graphics Processing Units (GPGPU) (2007). [электронный ресурс]: http://www.cse.chalmers.se/~uffe/hybridsort.pdf

Sengupta S. Scan primitives for GPU computing / S.Sengupta, M.Harris, Y.Zhang, J. D.Owens // In: Graphics Hardware 2007 (Aug. 2007), ACM, pp. 97–106.

Harris M., Sengupta S., Owens J. D. Parallel prefix sum (scan) with cuda / Harris M., Sengupta S., Owens J. D. // GPU Gems 3. Boston: Addison Wesley, 2007. 851–876.

Guy E. Blelloch Prefix Sums and Their Applications / Guy E. Blelloch // [электронный ресурс]: https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf.

The authors declare that there are no conflicts of interest present.