GemaCoreLib
The GeMA Core library
Functions
gmMergeVectors.cpp File Reference

Implementation of a set of functions for merging sorted vectors. More...

#include "gmMergeVectors.h"
#include <assert.h>
#include "gmTrace.h"
Include dependency graph for gmMergeVectors.cpp:

Functions

template<class DataT >
static void mergeVectorPair (const DataT *v1, size_t n1, const DataT *v2, size_t n2, DataT *out)
 Basic algorithm to merge two vectors storing the result in a third vector with size n1 + n2.
 
template<class DataT >
static void heapBasedMerge (const QVector< DataT * > &vectors, const QVector< size_t > &sizes, DataT *output)
 Merges a set of "k" ordered vectors whose pointers and sizes are given by vectors and sizes, storing the merged data in the given output vector with size equal to the sum of the input vector sizes. More...
 
template<class DataT >
void GmMergeKVectors (const QVector< DataT * > &vectors, const QVector< size_t > &sizes, DataT *output)
 Merges a set of "k" ordered vectors whose pointers and sizes are given by vectors and sizes, storing the merged data in the given output vector with size equal to the sum of the input vector sizes. More...
 

Detailed Description

Implementation of a set of functions for merging sorted vectors.

Author
Carlos Augusto Teixeira Mendes
Date
april, 2020

Function Documentation

◆ GmMergeKVectors()

template<class DataT >
void GmMergeKVectors ( const QVector< DataT * > &  vectors,
const QVector< size_t > &  sizes,
DataT *  output 
)

Merges a set of "k" ordered vectors whose pointers and sizes are given by vectors and sizes, storing the merged data in the given output vector with size equal to the sum of the input vector sizes.

Requires the DataT type to implement the < operator

◆ heapBasedMerge()

template<class DataT >
static void heapBasedMerge ( const QVector< DataT * > &  vectors,
const QVector< size_t > &  sizes,
DataT *  output 
)
static

Merges a set of "k" ordered vectors whose pointers and sizes are given by vectors and sizes, storing the merged data in the given output vector with size equal to the sum of the input vector sizes.

The merge is done by creating a minimum Heap with the smallest value from all vectors. At each step the minimum value is stored in the output vector and the next value from the source vector is added to the heap. Works in O(n * log k), where n is the size of the output vector and k is the number of vectors.

The implementation uses a custom based heap, instead of the std heap implementation, since it can be 50% faster by changing the minimum heap value and calling heapify() when updating the heap, instead of the available push and pop operations from the standard. A loser tree (as described in https://en.wikipedia.org/wiki/K-way_merge_algorithm) was also tried but it ended-up beeing slower than our heap implementation.

Requires the DataT type to implement the < operator

Heap data storing a pointer to the src vector and also a pointer to one position past the last one

< Pointer to the current vector value

< Pointer to a position past the end of the vector

Our custom based min heap. See Cormen for an explanation