Post-Increment vs Pre-Increment

There is a well known C++ BKM for preferring pre-increment (++i) over post-increment (i++). The rationale is that when post-incrementing and object, the object must increment itself and then return a temporary containing its old value. While this makes perfect sense for complex types (e.g. STL iterators) I’ve always wondered if it holds true for built-in types such as int. Let’s find out.

First let’s create a trivial example to examine:

#include <iostream>

using namespace std;

int main()
	// post-imcrement
	for(int i=0; i<10; i++)
		cout << "[" << i << "]" << endl;

	// pre-imcrement
	for(int i=0; i<10; ++i)
		cout << "[" << i << "]" << endl;

	return 0;

Now using the Visual Studio 2008 C++ compiler, let’s take a look at the generated assembly code. First let’s keep things simple and disable compiler optimizations (/Od):

	// post-imcrement
	for(int i=0; i<10; i++)
004114D3  mov         dword ptr [i],0 
004114D5  jmp         main+30h (4114E0h) 
004114D7  mov         eax,dword ptr [i] 
004114DA  add         eax,1 
004114DD  mov         dword ptr [i],eax 
004114E0  cmp         dword ptr [i],0Ah 
004114E4  jge         main+86h (411536h) 

<... snip ...>

	// pre-imcrement
	for(int i=0; i<10; ++i)
00411536  mov         dword ptr [i],0 
0041153D  jmp         main+98h (411548h) 
0041153F  mov         eax,dword ptr [i] 
00411542  add         eax,1 
00411545  mov         dword ptr [i],eax 
00411548  cmp         dword ptr [i],0Ah 
0041154C  jge         main+0EEh (41159Eh) 

The relevant parts of the assembly code are highlighted. As you can see, there is no difference between the generated code for post-increment and pre-increment.

Now, with compiler optimizations enabled (\O2) and the relevant parts highlighted:

	// post-imcrement
	for(int i=0; i<10; i++)
00401008  xor         esi,esi 
0040100A  lea         ebx,[ebx] 
<... snip ...>
0040104A  inc         esi  
0040104B  cmp         esi,0Ah 
0040104E  jl          main+10h (401010h) 

	// pre-imcrement
	for(int i=0; i<10; ++i)
00401050  xor         esi,esi 
<... snip ...>
0040108C  inc         esi  
0040108D  cmp         esi,0Ah 
00401090  jl          main+52h (401052h) 

Again there is no difference in the generated assembly between post-increment and pre-increment.

So, based on this quick experiment, there is no difference between pre-increment and post-increment for int types when using the Visual Studio 2008 C++ compiler with optimizations enabled or disabled.

Note that this doesn’t change the BKM to prefer pre-increments over post-increments when using integer types. The above experiment was only conducted with a single compiler – other compilers may behave differently and so it is best to err on the side of caution and use pre-increments where possible. Also, the use of pre-increment adds some future proofing – if the type of the variable being incremented changed to a more complex type then pre-increment then becomes the optimal choice.

Update: STL vs Gnulib Performance

Following up on a previous post about the relative performance of STL vs gnulib, this post extends and completes the analysis to include the stl::vector and stl::deque containers. These containers provide a more direct comparison against the Gnulib array-list and carray-list implementations. The results are shown below. Please refer to the previous post on this subject for details on how the results were collected including source code.

DISCLAIMER: Please note that the intent of the post is not to suggest that gnulib should be used instead of STL in C++ programs, but rather to point out that although not part of the C language specification, gnulib offers C programmers a set of containers whose performance is comparable to the STL container performance available to C++ programmers.

Push back (insert at end) performance

Push back (insert at end) performance

Search Performance

Search (unsorted) Performance

Note that the vertical axis represents seconds and the lower the bar for each algorithm, the better.

Based on the above results, it can be seen that the performance of the stl::list is comparable to the gnulib linked_list for both insertions (at the end) and unsorted searching. Gnulib array_list is comparable in implementation to the stl::vector but proves to be almost 2x faster for insertions and ~1.5x slower for searching for this test. The stl::deque is also comparable to the gnulib array_list and proves to be slightly faster for insertions and just under 3x slower for searching.

As a final note, this analysis only performed comparisons based on two tests – push back and unsorted find. This is just an indication of performance and far from complete. The results may well be different if different tests were chosen.

STL vs Gnulib Performance

STL (Standard Template Library) adds containers and algorithms to the C++ language. There is no equivalent library in the C specification but there are a number of available libraries that provide common data structure and algorithm implementations which may be used in C programs. Once such library is gnulib which provides, amongst other things, implementations for common container data structures such as lists and trees. This post provides a quick comparison of the performance of a number of the gnulib list implementations against the performance of the STL list implementation. For this analysis, gcc v4.0.1 using -O2 optimization and the gnulib source from June 13th, 2009 were used.

The Benchmark
A very simple benchmark was chosen for the performance characterization. Two operations were measured. The first was the insertion of a large number of integers into a list, with the integers being added one at a time to the end of the list. The second was the searching for elements within the list. It is acknowledged that these are very simple tests and that any results are somewhat inconclusive in the absence of a larger and more thorough suite of performance tests.

STL list benchmarking source code:

// The insert code
start = clock();
for(uint32_t i=0; i<INSERT_ITERATIONS; ++i)
stop = clock();

// The search code
start = clock();
for(uint32_t i=0; i<SEARCH_ITERATIONS; ++i)
    find(l.begin(), l.end(), i);
stop = clock();

Gnulib list benchmarking source code:

// The insert code
start = clock();
for(i=0; i<INSERT_ITERATIONS; ++i)
    gl_list_add_last(list, (void*)i);
stop = clock();

// The search code
start = clock();
for(i=0; i<SEARCH_ITERATIONS; ++i)
    gl_list_search(list, (void*)i);
stop = clock();

Both STL and gnulib provide a number of container implementations. For this analysis the focus is on the list container implementations. Although gnulib provides a number of “list” implementations only the gl_linked_list container is a true linked list and the comparison between this and the STL list is the most relevant comparison. However, for illustrating the comparative performance of the other gnulib list implementations, four gnulib list implementations were selected for benchmarking:

  • gl_linked_list
  • gl_linkedhash_list
  • gl_array_list
  • gl_rbtree_list

The comparative performance of the relevant STL containers that are most similar to the gnulib list containers other than the gl_linked_list will be provided in a follow-up post.

Finally, the tests were run on a Macbook running OS X 10 with a 2.4GHz Intel Core 2 Duo processor with 3MB L2 cache and 2GB of RAM.

The results for each algorithm were normalized with respect to the STL list results. So, with this normalization, a number less than one means that this is faster than the STL list implementation and a number greater than one means that this is slower than the STL list implementation.

The normalized results are shown in the graph below.


Based on the above results, the STL list implementation and the gnulib gl_linked_list implementation are very similar in terms of performance with the gnulib implementation being very slightly faster. The gl_array_list implementation offers the best insertion performance (>5x faster than the STL implementation) and the gl_linkedhash_list implementation offers the best search performance (~5000x faster than the STL implementation).

The pure linked list implementations of both STL list and the gl_linked_list are almost identical in performance for both tests.

While the above results show the relative performance of the STL list against the other gnulib container implementations, this is not a fair and complete comparison as the STL offers other sequence containers that have similar implementations to some of the gnulib containers analyzed here. A follow up post will fill out the comparison started here and include STL vectors and sets into the comparison.

EDIT: The original intent of this post to compare an STL list against the full suite of gnulib list implementations was deeply flawed. The gnulib “list” interface is closer to the more generic STL sequence interface and a more correct and complete comparison would be to compare the gnulib containers against the STL containers. This will be a focus of a follow-up post. I’ve edited this post to clarify the comparison in order to avoid confusion. While I realized my mistake shortly after publishing the post, I was happy to see that this was also picked up by several readers – so many thanks to those of you who provided comments for keeping me honest!

C++ Regular Expression Performance

A previous post gave some examples to illustrate how to use regular expressions as defined by the upcoming C++0x standard. The new regex features are certainly easy to use but what about the performance of some of the current implementations?

To find out, I compared the performance of the Micirosoft Visual Studio 2008 implementation against the regex implementation from the Boost library v1.36, and also against the PCRE v7.7 library. Microsoft introduced C++0x regex support in Visual Studio 2008 feature pack #1 and subsequently improved regex performance in Visual Studio 2008 service pack #1. Both the fp1 and sp1 implementations were included in the comparison analysis. The Boost library was chosen for comparison since this was the initial C++ regex implementation that was the basis for the proposal for inclusion into C++0x. PCRE was chosen for comparison since this is a very popular and in many respects the defacto regular expression library for use in C/C++ applications.

The regular expression patterns and data strings to use for comparison were taken from here. Three patterns of various complexity were selected and each of these was run against a set of five content strings. Each pattern/string search combination was run for 10000 iterations and the elapsed time was measured for each combination. The code used to perform the test is available here. Multiple runs were conducted to ensure that the results were consistent across each run. Once all the data was gathered, the results were normalized against libPcre – hence all the graphs show libPcre at a value of 1. By taking this normalization approach it is easier to compare the relative performance of each implementation.

ID Patterns Content Strings
1 ^(([^:]+)://)?([^:/]+)(:([0-9]+))?(/.*)
2 usd [+-]?[0-9]+.[0-9][0-9]
3 \b(\w+)(\s+\1)+\b usd 1234.00
4   he said she said he said no
5   same same same

The results are shown for each of the three patterns in the graphs below – one per pattern. The numbers along x-axis in each graph corresponds to the strings numbered 1-5 from the table above.


Regex Benchmark - Pattern #1


RegEx Benchmark - Pattern #2


RegEx Benchmark - Pattern #3

Based on the data shown in the graphs, a number of observations can be made:

  1. libPCRE v7.7 has the best performance for all pattern / string combinations.
  2. Performance varies significantly for different patterns. For example, for pattern #1, the boost implementation was faster than either of the Visual Studio 2008 (vc9) versions, but was slower for patterns #2 and #3. This seems to suggest that the Visual Studio 2008 implementations show relatively poor performance for patterns that contain alterations and a number of groups and nested groups.
  3. Based on patterns #1 and #2 the Visual C++ regex implementation performance has improved as claimed between feature pack 1 (fp1) and service pack 1 (sp1) but this is not completely true as can be seen from the pattern #3 results.

While the C++0x regex library makes it easier to use regular expressions in C++ code, there are still some performance gaps between the Visual Studio 2008 implementation and the performance of existing regular expression libraries such as libPCRE.

Note: The data shown in this article is based on a small sample of regular expression patterns and content strings and so the results should be taken in that context.

Regular Expressions in C++

Up to now, regular expression support in C/C++ programs was achieved using third party or open source regular expression libraries such as the PCRE library. With the addition of regex support to the C++ standard library as part of the C++0x standard update, using regular expressions in C++ programs has become much simpler. This feature is included in the TR1 draft reportwhich has already been implemented in some popular compilers such as gcc and Visual Studio 2008 (as part of service pack 1).  

Six regular expression grammars will be supported in C++0x. The default is based upon the ECMAScript grammar specified in ECMA-262. This syntax is based upon the PCRE syntax and is used by languages such as Perl, Python and Ruby which also provide built in regular expression support. Other supported grammars include the POSIX regex syntax, and the syntaxes used in tools such as awk, grep and egrep.

Here are some examples that illustrate how to perform some basic tasks with the new C++ regex component of the standard library.

Header files and namespaces:

#include <regex>

using namespace std::tr1;

Finding a match:

regex rgx("ello");
assert(regex_search("Hello World", rgx));

The above example illustrates the construction of a regex object, with the regex pattern being passed as a parameter to the regex constructor. The regex object is a specialization of the basic_regex template for working with regular expressions which are provided using sequences of chars. The regex_search()function template is then used to see if the “Hello world” string contains the “ello” pattern. This function returns true as soon as the first matching substring is found. The regex_search()function is also overloaded to provide versions that take sequence iterators as params (instead of a full string) and also versions that provide additional info on the match results.

Note: The use of assert() in the examples is used to highlight the “contract” provided by the api – e.g. to highlight if a function can be used in a conditional expression and if the function should return true or false for the particular example.

Finding an exact match:
The regex_match() function template is an alternative to regex_search() and is used when the target sequence must exactly match the regular expression.

regex rgx("ello");
assert(regex_match("Hello World", rgx) == false);
assert(regex_match("ello", rgx));

Finding the position of a match:
The sub_match or match_results template is used to receive search results from regex_search(). When searching char data, the library provides a ready made specialization of match_results called cmatch.

regex rgx("llo");
cmatch result;
regex_search("Hello World", result, rgx);
cout << "Matched \"" << result.str()
    << "\" after \"" << result.prefix()
    << "\" at offset: " << result.position()
    << " with length: " << result.length()
    << endl;

Working with capture groups:
Capture groups provide a means for capturing matched regions within a regular expression. Each captured region is represented by a sub_match template object. The smatch specialization of match_results is provided by the library for working with sequences of string sub-matches.

string seq = "";
regex rgx("(.*)@(.*)");
smatch result;
regex_search(seq, result, rgx);
for(size_t i=0; i<result.size(); ++i)
    cout << result[i] << endl;

Case insensitive searches:

regex rgx("ello", regex_constants::icase);
assert(regex_search("HELLO WORLD", rgx));