Update: STL vs Gnulib Performance

Following up on a previous post about the relative performance of STL vs gnulib, this post extends and completes the analysis to include the stl::vector and stl::deque containers. These containers provide a more direct comparison against the Gnulib array-list and carray-list implementations. The results are shown below. Please refer to the previous post on this subject for details on how the results were collected including source code.

DISCLAIMER: Please note that the intent of the post is not to suggest that gnulib should be used instead of STL in C++ programs, but rather to point out that although not part of the C language specification, gnulib offers C programmers a set of containers whose performance is comparable to the STL container performance available to C++ programmers.

Push back (insert at end) performance

Push back (insert at end) performance

Search Performance

Search (unsorted) Performance

Note that the vertical axis represents seconds and the lower the bar for each algorithm, the better.

Based on the above results, it can be seen that the performance of the stl::list is comparable to the gnulib linked_list for both insertions (at the end) and unsorted searching. Gnulib array_list is comparable in implementation to the stl::vector but proves to be almost 2x faster for insertions and ~1.5x slower for searching for this test. The stl::deque is also comparable to the gnulib array_list and proves to be slightly faster for insertions and just under 3x slower for searching.

As a final note, this analysis only performed comparisons based on two tests – push back and unsorted find. This is just an indication of performance and far from complete. The results may well be different if different tests were chosen.

STL vs Gnulib Performance

STL (Standard Template Library) adds containers and algorithms to the C++ language. There is no equivalent library in the C specification but there are a number of available libraries that provide common data structure and algorithm implementations which may be used in C programs. Once such library is gnulib which provides, amongst other things, implementations for common container data structures such as lists and trees. This post provides a quick comparison of the performance of a number of the gnulib list implementations against the performance of the STL list implementation. For this analysis, gcc v4.0.1 using -O2 optimization and the gnulib source from June 13th, 2009 were used.

The Benchmark
A very simple benchmark was chosen for the performance characterization. Two operations were measured. The first was the insertion of a large number of integers into a list, with the integers being added one at a time to the end of the list. The second was the searching for elements within the list. It is acknowledged that these are very simple tests and that any results are somewhat inconclusive in the absence of a larger and more thorough suite of performance tests.

STL list benchmarking source code:

// The insert code
start = clock();
for(uint32_t i=0; i<INSERT_ITERATIONS; ++i)
stop = clock();

// The search code
start = clock();
for(uint32_t i=0; i<SEARCH_ITERATIONS; ++i)
    find(l.begin(), l.end(), i);
stop = clock();

Gnulib list benchmarking source code:

// The insert code
start = clock();
for(i=0; i<INSERT_ITERATIONS; ++i)
    gl_list_add_last(list, (void*)i);
stop = clock();

// The search code
start = clock();
for(i=0; i<SEARCH_ITERATIONS; ++i)
    gl_list_search(list, (void*)i);
stop = clock();

Both STL and gnulib provide a number of container implementations. For this analysis the focus is on the list container implementations. Although gnulib provides a number of “list” implementations only the gl_linked_list container is a true linked list and the comparison between this and the STL list is the most relevant comparison. However, for illustrating the comparative performance of the other gnulib list implementations, four gnulib list implementations were selected for benchmarking:

  • gl_linked_list
  • gl_linkedhash_list
  • gl_array_list
  • gl_rbtree_list

The comparative performance of the relevant STL containers that are most similar to the gnulib list containers other than the gl_linked_list will be provided in a follow-up post.

Finally, the tests were run on a Macbook running OS X 10 with a 2.4GHz Intel Core 2 Duo processor with 3MB L2 cache and 2GB of RAM.

The results for each algorithm were normalized with respect to the STL list results. So, with this normalization, a number less than one means that this is faster than the STL list implementation and a number greater than one means that this is slower than the STL list implementation.

The normalized results are shown in the graph below.


Based on the above results, the STL list implementation and the gnulib gl_linked_list implementation are very similar in terms of performance with the gnulib implementation being very slightly faster. The gl_array_list implementation offers the best insertion performance (>5x faster than the STL implementation) and the gl_linkedhash_list implementation offers the best search performance (~5000x faster than the STL implementation).

The pure linked list implementations of both STL list and the gl_linked_list are almost identical in performance for both tests.

While the above results show the relative performance of the STL list against the other gnulib container implementations, this is not a fair and complete comparison as the STL offers other sequence containers that have similar implementations to some of the gnulib containers analyzed here. A follow up post will fill out the comparison started here and include STL vectors and sets into the comparison.

EDIT: The original intent of this post to compare an STL list against the full suite of gnulib list implementations was deeply flawed. The gnulib “list” interface is closer to the more generic STL sequence interface and a more correct and complete comparison would be to compare the gnulib containers against the STL containers. This will be a focus of a follow-up post. I’ve edited this post to clarify the comparison in order to avoid confusion. While I realized my mistake shortly after publishing the post, I was happy to see that this was also picked up by several readers – so many thanks to those of you who provided comments for keeping me honest!

Visual C++ 2008 Feature Pack Released

Previous posts have mentioned the Visual C++ 2008 feature pack which was available as a beta release. The full version has been released and is available for download

The previously reported bug with array::max_size() has now been fixed.

Some TR1 related resources from Microsoft:

C++ Lambda Functions

A previous post showed a code snippet for dumping the contents of an STL container using an ostream_iterator. Things have now gotten a little bit easier with the introduction of lambda functions into the upcoming C++0x standard. Lambda functions allow pieces of code to be passed around as if they were ordinary objects. Applying lambda functions for dumping the contents of an STL container gives:

vector v(10);
generate(v.begin(), v.end(), rand);
for_each(v.begin(), v.end(),
    [](int& x){ cout << x << " "; }) [/sourcecode] OK, so the syntax looks a bit strange at first, but it does add a powerful construct to the language. Lambdas have only just been added to the C++0x standard and support for lambdas are not included in the beta version of the Visual Studio 2008 TR1 feature pack.

For more details and examples of lambda functions, check out Herb Sutter’s recent trip report from the February/March ISO C++ standards meeting.

C++ TR1: array VS 2008 Bug

Microsoft have recently released a Beta version of the Visual Studio 2008 Feature Pack which includes support for most of the C++ standard library extensions described in TR1. Alas, the beta sticker is appropriate as there are still some bugs to be ironed out.

The TR1 array container template provides functionality for implementing a fixed size array. This is a halfway point between plain old C style arrays and C++ STL vectors. The defining property of the array container is that its size is fixed. Consider and example of an array of 4 integers

array<int, 4> = { 0, 1, 2, 3 };

Since it adheres to the STL container rules, it must implement methods such as size(), and max_size(). For the array container, both of these should return the size of the array, which is fixed. In the above example, both should return 4. However, when using the VS 2008 TR1 implementation of array, a bug appears. The code:

#include using namespace std;
using namespace std::tr1;

int main()
    array arr = {1, 2, 3, 4};
    cout << "size: " << arr.size() << endl;     cout << "max_size: " << arr.max_size() << endl;     return 0; }[/sourcecode]

Produces the output:

size: 4max_size: 1073741823

instead of:

size: 4max_size: 4

If we take a look at the implementation of max_size() we can see the problem

size_type max_size() const
    // return maximum possible length of sequence
    size_type _Count = (size_type)(-1) / sizeof(_Ty);
    return (0 < _Count ? _Count : 1); }[/sourcecode] Instead of simply retuning N (the size of the array), it performs the same computation as if this was a vector. This issue has been logged as a bug with Microsoft and will hopefully be fixed before the "Gold" release of the feature pack.




Simple container dump using STL iterator

Quick and dirty printing of containers contents in C++ using STL ostream_iterator …

#include <iostream>
#include <algorithm>
#include <vector>

using namespace std;

int main()
    vector<int> v(10);
    generate(v.begin(), v.end(), rand);
    copy(v.begin(), v.end(),
        ostream_iterator<int>(cout, "n"));

    return 0;