cmAlgorithms: Add cmRemoveDuplicates algorithm.

Start by creating a vector to hold a unique values of the input range.  We
expect that in most cases, there will be relatively few duplicates, so
reserving enough memory for a complete copy is worthwhile.  Unlike a solution
involving a std::set, this algorithm allocates all the memory it needs
in one go and in one place, so it is more cache friendly.

Populate the unique copy with a lower_bound insert algorithm and record the
indices of duplicates.  This is the same complexity as the std::set insert
algorithm, but without the need to allocate memory on the heap and other
disadvantages of std::set.

Remove the duplicates with the cmRemoveIndices algorithm.
This commit is contained in:
Stephen Kelly 2015-02-15 19:08:12 +01:00
parent 3cfe7a4ca8
commit cebeed2486
1 changed files with 29 additions and 0 deletions

View File

@ -250,4 +250,33 @@ typename Range::const_iterator cmRemoveMatching(Range &r, MatchRange const& m)
ContainerAlgorithms::BinarySearcher<MatchRange>(m));
}
template<typename Range>
typename Range::const_iterator cmRemoveDuplicates(Range& r)
{
std::vector<typename Range::value_type> unique;
unique.reserve(r.size());
std::vector<size_t> indices;
size_t count = 0;
for(typename Range::const_iterator it = r.begin();
it != r.end(); ++it, ++count)
{
typename Range::iterator low =
std::lower_bound(unique.begin(), unique.end(), *it);
if (low == unique.end() || *low != *it)
{
unique.insert(low, *it);
}
else
{
indices.push_back(count);
}
}
if (indices.empty())
{
return r.end();
}
std::sort(indices.begin(), indices.end());
return cmRemoveIndices(r, indices);
}
#endif