qsort vs. std::sort
When writing high performance code, there are pretty well two options: C and C++.
- C is essentially a portable wrapper around assembly. You can tell exactly how much every action costs, where memory is allocated, and where it is freed. Nothing is hidden. However, because of this direct control, writing C code is very tedious and error prone.
- C++ is a bit higher level. You give up some control in exchange for ergonomics. For example, returning a
vector
from a function is the difference between an O(1) and O(n) operation, depending on if the compiler implements return value optimization. However, C++ templates give the compiler more type information, which it can sometimes use to generate faster code.
A classic example of when C++ outperforms C is sorting. In item 46 of his book Effective STL, Scott Meyers claims that std::sort
can outperform qsort
by up to 670% because C++ will inline the comparison function during template instantiation. Let's see how they compare with sorting a vector of a million random integers. Writing and measuring accurate benchmarks is difficult, so we're going to let the Nonius framework do it for us.
// sort.cpp
#define NONIUS_RUNNER
#include "nonius.h++"
#include <algorithm>
#include <cstdlib>
#include <random>
// qsort
static int comp(const void* const a, const void* const b) {
const int arg1 = *static_cast<const int*>(a);
const int arg2 = *static_cast<const int*>(b);
// we can't simply return a - b, because that might under/overflow
return (arg1 > arg2) - (arg1 < arg2);
}
// std::sort with no inlining
struct compare_noinline {
__attribute__((noinline))
bool operator()(const int a, const int b) const {
return a < b;
}
};
// std::sort with inlining
struct compare {
// the compiler will inline this automatically
bool operator()(const int a, const int b) const {
return a < b;
}
};
static std::vector<int> random_vector(const size_t size) {
std::random_device seed{};
std::default_random_engine engine{seed()};
std::uniform_int_distribution<int> dist{std::numeric_limits<int>::min(),
std::numeric_limits<int>::max()};
std::vector<int> vec(size);
std::generate(vec.begin(), vec.end(), [&]{ return dist(engine); });
return vec;
}
// generate a vector of a million random integers
constexpr size_t size = 1'000'000;
static const auto rand_vec = random_vector(size);
NONIUS_BENCHMARK("qsort", [](nonius::chronometer meter) {
// Nonius does multiple runs of the benchmark, and each one needs a new
// copy of the original vector, otherwise we'd just be sorting the same
// one over and over
const auto runs = static_cast<size_t>(meter.runs());
std::vector<std::vector<int>> vectors{runs, rand_vec};
meter.measure([&](const size_t run) {
auto& current_vec = vectors[run];
std::qsort(current_vec.data(), current_vec.size(), sizeof(int), comp);
});
});
NONIUS_BENCHMARK("std::sort noinline", [](nonius::chronometer meter) {
const auto runs = static_cast<size_t>(meter.runs());
std::vector<std::vector<int>> vectors{runs, rand_vec};
meter.measure([&](const size_t run) {
auto& current_vec = vectors[run];
std::sort(current_vec.begin(), current_vec.end(), compare_noinline{});
});
});
NONIUS_BENCHMARK("std::sort inline", [](nonius::chronometer meter) {
const auto runs = static_cast<size_t>(meter.runs());
std::vector<std::vector<int>> vectors{runs, rand_vec};
meter.measure([&](const size_t run) {
auto& current_vec = vectors[run];
std::sort(current_vec.begin(), current_vec.end(), compare{});
});
});
Let's compile this bad boy.
# Apple Clang 8.0.0
# Note that Nonius has a dependency on Boost
$ clang++ -std=c++14 -stdlib=libc++ -O3 -march=native sort.cpp -o sort
$ ./sort
Nonius prints out quite a lot of information, so I'll summarize the most important bits.
qsort 253 ms +/- 8 ms
std::sort noinline 137 ms +/- 6 ms
std::sort inline 88 ms +/- 5 ms
With no inlining, std::sort
is already 1.8x faster than qsort
. I am unsure why this is, it could be from differences in algorithms. Inlining bumps this up to 2.9x faster: an impressive speedup.