Implementation of a set of functions for integer vector sorting of triplet data. More...

#include "gmIntegerSort.h"
#include "gmMemory.h"
#include <QVarLengthArray>
#include "ska_sort.hpp"
#include "gmOmp.h"
#include <assert.h>
#include "gmTrace.h"

Include dependency graph for gmIntegerSort.cpp:

Macros
#define	CS_MASK(v) (( ((KeyT)(v)) >> shift) & mask)
	Macro used by countSort to extract the sort key from the current value v.

Functions
template<class DataT , class KeyT , class BucketT >
static void	countSort (DataT src, size_t n, DataT dst, unsigned maskSize, unsigned shift, BucketT *bucketMem)
	A serial, stable, Count sort implementation based on the algorithm in "Introduction to Algorithms", Cormen et all. More...

template<class DataT , class KeyT , class BucketT >
static void	parallelCountSort (DataT src, size_t n, DataT dst, unsigned maskSize, unsigned shift, BucketT *bucketMem, int nt)
	Simmilar to countSort() but running in parallel with 'nt' threads. More...

template<class DataT , class KeyT , class BucketT >
DataT *	GmLsdRadixSort (DataT src, size_t n, DataT aux, bool auxCopy, unsigned digitBits, unsigned keyBits, unsigned keyOffset)
	A serial, stable, Radix sort implementation based on the algorithm in "Introduction to Algorithms", Cormen et all. More...

template<class DataT , class KeyT , class BucketT >
DataT *	GmPLsdRadixSort (DataT src, size_t n, DataT aux, bool auxCopy, int nt, unsigned digitBits, unsigned keyBits, unsigned keyOffset)
	Simmilar to GmLsdRadixSort() but executing with nt threads. More...

template<class DataT >
GMC_API_EXPORT void	GmSkaRadixSort (DataT *src, size_t n)
	Simple stub for calling ska_sort(). Used to include the external lib inside GemaCoreLib so that clients don't need to include its path.

template<class DataT , class ExtractKey >
GMC_API_EXPORT void	GmSkaRadixSort (DataT *src, size_t n, ExtractKey &&key)
	Simple stub for calling ska_sort(). Used to include the external lib inside GemaCoreLib so that clients don't need to include its path.

Detailed Description

Implementation of a set of functions for integer vector sorting of triplet data.

Author: Carlos Augusto Teixeira Mendes

Date: april, 2020

Function Documentation

◆ countSort()

template<class DataT , class KeyT , class BucketT >

static void countSort	(	DataT *	src,
		size_t	n,
		DataT *	dst,
		unsigned	maskSize,
		unsigned	shift,
		BucketT *	bucketMem
	)

static

A serial, stable, Count sort implementation based on the algorithm in "Introduction to Algorithms", Cormen et all.

This is a generic implementation that can be used for any vector data type where the sorting key is an unsigned number and is present as the FIRST field in the data type structure. It also receives as parameters the number of "digit" bits that should be used for sorting (maskSize) and its position inside the key (shift).

This implementation is NOT in-place and needs a destination vector with the same size as the input one. It also uses stack space equal to 2^maskSize * sizeof(BucketT) bytes for the counting buckets, making a maskSize of 16 bits its "practical" limit (64K * sizeof(BucketT)). The bucket size must be provided through the bucketMem vector. It is not allocated inside the function since it can be reused along multiple calls when countSort() is called by

radix sort (it also can't be allocated on the stack since maskSize is

a parameter).

Keep in mind that a faster algorithm can be implemented if we restrict ourselves to a scenario where the keys are limited to 16 bits types and the digit is the full value. In that case the last loop can be replaced by just sequentially filling the vector with the bucket index value for the number of occurrences in each bucket (see reference (1)).

This implementation also does NOT handle signed numbers. If signed values are treated as unsigned, the resulting order will be inverted with negative values after positive ones. See reference (1) below for an idea of how to handle the sign bit by splitting the last loop in two.

Links: https://www.drdobbs.com/architecture-and-design/algorithm-improvement-through-performanc/224201198

Template Parameters

DataT	The type of the vector data
KeyT	The type of the key used to sort the vector. Must be unsigned.
BucketT	The unsigned type used for the count array. Must be big enough to fit the value of n.

Parameters

src	The vector to be sorted
n	The vector size
dst	The destination vector that will be used to hold the sorted data
maskSize	The number of bits in the sorting digits.
shift	The right shift size needed to bring the "digit" part of the key to the least significant bits of the key. The digit is extracted by "(key >> shift) & mask" where mask is a value with maskSize 1's on the least significant bits.
bucketMem	A vector with 2^maskSize entries

◆ GmLsdRadixSort()

template<class DataT , class KeyT , class BucketT >

DataT* GmLsdRadixSort	(	DataT *	src,
		size_t	n,
		DataT *	aux,
		bool	auxCopy,
		unsigned	digitBits,
		unsigned	keyBits,
		unsigned	keyOffset
	)

A serial, stable, Radix sort implementation based on the algorithm in "Introduction to Algorithms", Cormen et all.

This is a generic implementation that can be used for any vector data type where the sorting key is an unsigned number and is present as the FIRST field in the data type structure.

Sorting is done by running counting sort repeatedly for each key digit, starting from the least significant digit (LSD). Digit size is a function parameter. There is support for when the digit size does not evenly divides the key type size (ex: 32 bits key with 12 bits digit)

Another parameter specifies how many bits of the key must be sorted. This allows, for example, when sorting a vector with 32-bit keys and 8 bit digits, where the maximum key value is 16.000.000 (24 significant digits) to issue only 3 sorting passes instead of the usual 4.

This implementation is NOT in-place and needs a destination vector with the same size as the input one. It also uses stack space equal to 2^digitBits * sizeof(BucketT) for the counting buckets, making a digitBits of 16 bits its "practical" limit (64K * sizeof(BucketT))

This implementation also does NOT handle signed numbers. See comments in the countSort() function.

Template Parameters

DataT	The type of the vector data
KeyT	The type of the key used to sort the vector. Must be unsigned.
BucketT	The unsigned type used for the count array. Must be big enough to fit the value of n.

Parameters

src	The vector to be sorted
n	The vector size
aux	Auxiliar vector with size n used by the sort routine. Can be NULL. In that case the vector will be allocated and freed internally.
auxCopy	Flag used only if aux is different from NULL. If set to true, the sorted result will always end-up in src. If the number of sort passes is odd, a memcpy oeration will ensure that the result ends up in src.

If set to false, no extra copy will be done and the result is either in src or in aux, as reported by the function return.

Parameters

digitBits	The number of bits in the sorting digits.
keyBits	The number of bits in the key that should be considered for sorting.
keyOffset	The offset in bits for the start of the key. Usually 0.

Returns: Returns the vector that stores the sorted result. Normally will be equal to src, but if aux != NULL, auxCopy == false and the number of sorting passes is odd, will be equal to aux. Will return NULL if the function could not allocate auxiliary memory.

◆ GmPLsdRadixSort()

template<class DataT , class KeyT , class BucketT >

DataT* GmPLsdRadixSort	(	DataT *	src,
		size_t	n,
		DataT *	aux,
		bool	auxCopy,
		int	nt,
		unsigned	digitBits,
		unsigned	keyBits,
		unsigned	keyOffset
	)

Simmilar to GmLsdRadixSort() but executing with nt threads.

The extra parameter nt controls the number of requested threads. A value of 0 (or -1 for compatibility with GmThreadManager) means that omp_get_max_threads() will be used.

◆ parallelCountSort()

template<class DataT , class KeyT , class BucketT >

static void parallelCountSort	(	DataT *	src,
		size_t	n,
		DataT *	dst,
		unsigned	maskSize,
		unsigned	shift,
		BucketT *	bucketMem,
		int	nt
	)

static

Simmilar to countSort() but running in parallel with 'nt' threads.

The parallelization is done by creating a bucket vector per thread. In the count step each thread counts the occurrences in a block (partition) from the global vector.

The second step adjusts each bucket[i], from each thread, to contain number of value less than i from all threads up until that one. This gives the "position" of a value i for that thread.

With the bucket vector adjusted, the last step where each value is placed in its final position can also proceed with each thread taking care of its own block.

The nt parameter (> 0) contains the number of threads that should be used in the paralelization. The bucketMem vector points to an area with size equal to 2^maskSize * nt to be used as the per thread bucket vector.

Macros

Functions

Detailed Description

Function Documentation

◆ countSort()

◆ GmLsdRadixSort()

◆ GmPLsdRadixSort()

◆ parallelCountSort()