STL vs Gnulib Performance

June 20th, 2009

STL (Standard Template Library) adds containers and algorithms to the C++ language. There is no equivalent library in the C specification but there are a number of available libraries that provide common data structure and algorithm implementations which may be used in C programs. Once such library is gnulib which provides, amongst other things, implementations for common container data structures such as lists and trees. This post provides a quick comparison of the performance of a number of the gnulib list implementations against the performance of the STL list implementation. For this analysis, gcc v4.0.1 using -O2 optimization and the gnulib source from June 13th, 2009 were used.

The Benchmark
A very simple benchmark was chosen for the performance characterization. Two operations were measured. The first was the insertion of a large number of integers into a list, with the integers being added one at a time to the end of the list. The second was the searching for elements within the list. It is acknowledged that these are very simple tests and that any results are somewhat inconclusive in the absence of a larger and more thorough suite of performance tests.

STL list benchmarking source code:


// The insert code
start = clock();
for(uint32_t i=0; i<INSERT_ITERATIONS; ++i)
    l.push_back(i);
stop = clock();

// The search code
start = clock();
for(uint32_t i=0; i<SEARCH_ITERATIONS; ++i)
    find(l.begin(), l.end(), i);
stop = clock();

Gnulib list benchmarking source code:

// The insert code
start = clock();
for(i=0; i<INSERT_ITERATIONS; ++i)
    gl_list_add_last(list, (void*)i);
stop = clock();

// The search code
start = clock();
for(i=0; i<SEARCH_ITERATIONS; ++i)
    gl_list_search(list, (void*)i);
stop = clock();

Both STL and gnulib provide a number of container implementations. For this analysis the focus is on the list container implementations. Although gnulib provides a number of “list” implementations only the gl_linked_list container is a true linked list and the comparison between this and the STL list is the most relevant comparison. However, for illustrating the comparative performance of the other gnulib list implementations, four gnulib list implementations were selected for benchmarking:

  • gl_linked_list
  • gl_linkedhash_list
  • gl_array_list
  • gl_rbtree_list

The comparative performance of the relevant STL containers that are most similar to the gnulib list containers other than the gl_linked_list will be provided in a follow-up post.

Finally, the tests were run on a Macbook running OS X 10 with a 2.4GHz Intel Core 2 Duo processor with 3MB L2 cache and 2GB of RAM.

Results
The results for each algorithm were normalized with respect to the STL list results. So, with this normalization, a number less than one means that this is faster than the STL list implementation and a number greater than one means that this is slower than the STL list implementation.

The normalized results are shown in the graph below.

list_perf

Based on the above results, the STL list implementation and the gnulib gl_linked_list implementation are very similar in terms of performance with the gnulib implementation being very slightly faster. The gl_array_list implementation offers the best insertion performance (>5x faster than the STL implementation) and the gl_linkedhash_list implementation offers the best search performance (~5000x faster than the STL implementation).

Conclusions
The pure linked list implementations of both STL list and the gl_linked_list are almost identical in performance for both tests.

While the above results show the relative performance of the STL list against the other gnulib container implementations, this is not a fair and complete comparison as the STL offers other sequence containers that have similar implementations to some of the gnulib containers analyzed here. A follow up post will fill out the comparison started here and include STL vectors and sets into the comparison.

EDIT: The original intent of this post to compare an STL list against the full suite of gnulib list implementations was deeply flawed. The gnulib “list” interface is closer to the more generic STL sequence interface and a more correct and complete comparison would be to compare the gnulib containers against the STL containers. This will be a focus of a follow-up post. I’ve edited this post to clarify the comparison in order to avoid confusion. While I realized my mistake shortly after publishing the post, I was happy to see that this was also picked up by several readers – so many thanks to those of you who provided comments for keeping me honest!

Post to Twitter Tweet This Post Post to Digg Digg This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

The Unexpected Upgrade Experience

June 14th, 2009

A couple of months ago I upgraded my hosting package during its annual renewal. Up to then I had an excellent experience with my provider and so when they made me an offer to move to their new hosting package for a similar price as my old package it seemed like a reasonable thing to do. My thinking was that since the old package was no longer available to new customers then it would only be a matter of time before support for the old package was reduced and eventually discontinued.

Since the migration to the new package, I have been plagued with issues. Nothing catastrophic but having to raise support tickets and the resultant email tag with the support team gets tiresome after a ticket or two. The root of the problem is that the majority of the help items and howtos relate to the old package and not the new one. That would be fine if they both operated similarly – but they don’t!

I’ll use my most recent issue as an example. On the old package, the WordPress application was installed for a domain using the application manager and the application was installed into the httpdocs directory area for that domain. I stupidly assumed that the same would be true for the new package – especially since the application manager offered a WordPress installation. So, I installed the application and imported the posts from my old package and everything seemed to be fine. That was until I tried to change the permalink format for the posts. When I tried this I got a .htaccess permissions error. It turns out there were two things wrong. The first was that the WordPress files in my domain’s httpdocs area were actually a copy from my old package that were copied across by my hosting provider during the migration and were completely independent of the current WordPress installation. The second was that the application install for WordPress installed into a hidden directory tree that was so hidden that you could not browse to it without knowing the exact path before hand – and that exact path contained a five digit object id as the hidden part of its pathname. Once the support team figured this out, we easily sorted it out. 

This sort of thing should not have happened in the first place. Where differences in hosting packages exist, there should be a cheat sheet available to help people migrate between packages. The howto documentation should also cover both packages, not just the old one. Having good tech support is great but having a great product that doesn’t require tech support is so much better!

Hopefully all my problems are finally resolved. I’m on the fence as to whether or not I’ll stick with my provider … one more issue will push me over!

Post to Twitter Tweet This Post Post to Digg Digg This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Frequent Informal Code Reviews

June 13th, 2009

The traditional code review model is to write a bunch of code and then gather the troops to review it once its done and ready. Following the review the author then spends a couple of hours/days in rework.
In contrast one of the benefits of pair programming is the just in time scrutiny of a second pair of eyes while the code is being developed. This just in time scrutiny helps pick up problems early and the cost of rework is minimal. Pair programming does however have some pitfalls and is not as prevalent as it supporters would like.
There is a middle ground that involves team members spending a few minutes each day informally reviewing any changes that were committed to the source code repository in the previous day. The informal review involves reading the changes and posting questions and comments to the author. This provides “almost in time” feedback and keeps everyone in tune with the changes and direction of the code base.

This approach has the following characteristics:

  • Collective code ownership
  • Frequent informal code reviews
  • Requires regular commits to the repository
  • Requires discipline from team members to frequently review new changes – otherwise thy just build up and we are back to where we began
  • Has effect of improving code quality  – people are less likely to commit rubbish with the intent of cleaning it up later since they know they will receive instant feedback

Note that some shops mandate a code review prior to each checkin. While this approach does help keep up the quality of committed code, it does introduce delays and bottlenecks in the checkin process.

Post to Twitter Tweet This Post Post to Digg Digg This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Embedded Coding Tips

May 18th, 2009

Good series of articles from Jack Ganassle on practical embedded coding tips. A recommended read for embedded programmers of all skill levels.

Post to Twitter Tweet This Post Post to Digg Digg This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Ensuring Success at Code Reviews

April 26th, 2009

Volumes have been written on the subject of code reviews and various processes for conducting them. Most processes describe particular code review roles such as reader, moderator etc. and formats for conducting the code review meeting. These roles are certainly important to keep the code review meeting on track and to avoid the review degenerating into religious wars on commenting styles and magic numbers. However, I have witnessed many code reviews that had brilliant moderators and readers but failed to find any significant defects. It wasn’t because the code under review was defect free because serious defects were often found during testing. At one point a team was seriously considering scraping code reviews due to low return for a huge effort. So what went wrong?

I believe there were two general problems. The first was that the areas under review were rather specialist areas involving complex design and code. In general, most of the reviewers were not familiar with either the code or the details of the design prior to the review. The second problem was that in many cases the people reviewing the code did not have sufficient levels of experience in the area under review to be considered experts in that area. On average more than 50% of the reviewers were new developers ramping up on the technology area and in some cases even ramping up on the programming language itself. The code reviews were used as a ramp-up task for these developers and while it certainly provided an opportunity for these developers to ask questions about various parts of the code, they were not in a position to be able to critically review the code.

So what can we learn from this? My key takeaways are:

  1. Ensure all reviewers are sufficiently familiar with the design and the code prior to the review. A one hour presentation on the design is usually not enough to provide this familiarization.
  2. Ensure sufficient experts are involved in the review.
  3. Using code reviews as part of a newbie’s ramp up is a good idea, but the newbies shouldn’t be relied upon to provide the critical feedback for the review. As a guideline, newbies should not compromise more than 25% of the reviewers.
  4. If any of the above are not satisfied then postpone the review until they can be satisfied.

Post to Twitter Tweet This Post Post to Digg Digg This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

SCRUM and Distributed Teams

April 15th, 2009

Good paper on how to increase the chance of success when using SCRUM and agile techniques with distributed teams. Interesting use of the term “project gravity”. From the abstract:

Distributed agile can work, but it is risky. Even more than traditional agile implementations, teams tend to fall into two failure modes: “command and control” or team fragmentation. Distribution between teams or team members has two distinct dimensions: separation and project gravity. Distribution, including physical distance, is the most commonly recognized barrier to effective distributed teams. Gravity is the other, less recognized dimension of separation and is some stable center which teams can organize around. This can take the form of a stable product backlog, a clear mission, or an inspiring product owner. Failing to recognize where your project lies along the dimensions of separation will make it more prone to the two failure modes. Both failure modes are discussed in terms of the dimensions of distribution, with suggestions for remedial actions. In either case, leadership must be aware of both dimensions of a team/project’s distribution, and act accordingly.

Post to Twitter Tweet This Post Post to Digg Digg This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Minimize Context Switches Through Priority Assignment

March 29th, 2009

A context switch is often one of the most expensive operation in an RTOS (Real Time Operating System). Hence, application developers should aim to minimize the number of context switches in order to minimize the RTOS overhead.

In a pre-emptive RTOS, both the choice of thread priority assignment and the order in which threads become available to run are key to minimizing the number of context switches that occur.

In a series of articles on context switching, William E. Lamie presents data which shows that context switching overhead is minimized when all threads are assigned the same priority. This means that by assigning the same priority to all threads, the system throughput will be maximized – i.e. the fewer the cycles that are spent on context switching overhead then the more that can be spent on application processing.

However, in real-time systems latency is often a concern. Where latency is a concern, the thread that runs the latency sensitive task should be assigned a higher priority.  This will introduce more context switches into the system which will impact throughput.

To achieve the best of both worlds, threads should be assigned an equal priority by default and only latency sensitive tasks should be assigned higher priorities.

Post to Twitter Tweet This Post Post to Digg Digg This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

UML State Machines

March 26th, 2009

An excellent series of articles on UML state machines and on state machine theory in general …

In particular, I like the practical examples and discussion of implementation options for various aspects of the notation.

This series of articles is taken from the book: Practical UML State Charts In C/C++, Second Edition by Miro Samek.

Post to Twitter Tweet This Post Post to Digg Digg This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Regular Expressions for Counting Lines of Code

January 15th, 2009

One of the applications that I implement as a means of gaining experience in different programming languages is an application to generate code metrics. I implement the application in the chosen language and use it to count comments, lines of code etc. in source files of a variety of programming languages. Implementing this application is a useful learning exercise since it provides exposure to a variety of language features such as file handling, regular expressions and also csv and html generation (for presenting the output statistics). Correctly using regular expressions for finding source code comments right can be a tricky thing and this is the focus of this post. A Ruby based implementation of this application, called countloc, is hosted on RubyForge for reference.

A basic set of code metrics include:

  • lines of code 
  • comment lines
  • blank lines
  • total lines

For comment lines, there are three types of comments to worry about:

  • single line comments
  • multi-line comments
  • mixed comments and code in a single line

For multiline comments, it is necessary to detect the beginning and the end of the comment. For a mixture of comments and code on a single line, it is necessary to distinguish between lines containing only comments so that the “lines of code” count can be incremented. The set of regular expressions used by countloc for finding comments in Ruby, Python, Java, C++, C, C# code is shown below. Note that languages such as C, C++, C# and Java all share similar commenting styles and they are covered under the cplusplus category. Note also that escaping is required for encoding the //, /* and */ characters for C++ comments in Ruby regular expressions.

COMMENT_PATTERNS = {
  :ruby => {
    :single_line_full => /^\s*#/,
    :single_line_mixed => /#/,
    :multiline_begin => /=begin(\s|$)/,
    :multiline_end => /=end(\s|$)/,
    :blank_line => /^\s*$/
  },

  :python => {
    :single_line_full => /^\s*#/,
    :single_line_mixed => /#/,
    :multiline_begin => /^\s*"""/,
    :multiline_end => /"""/,
    :blank_line => /^\s*$/
  },

  :cplusplus => {
    :single_line_full => /^\s*\/\//, 
    :single_line_mixed => /\/\//,
    :multiline_begin => /\/\*/,
    :multiline_end => /\*\/\s*$/,
    :multiline_begin_mixed => /^[^\s]+.*\/\*/,
    :multiline_end_mixed => /\*\/\s*[^\s]+$/,
    :blank_line => /^\s*$/
  }
}

As each line of source code is read, the above regular expressions are evaluated in the following order with the convention that as soon as a match is found, no further regular expressions need to be evaluated … unless otherwise noted.

  1. multiline_end_mixed (if within a multiline comment) 
  2. multiline_end (if within a multiline comment)
  3. multiline_begin … then check for multiline_begin_mixed and multiline_end (to check for a complete  multiline comment on a single line!)
  4. single_line_full
  5. single_line_mixed
  6. blank_line

Based on the countloc implementation and the code samples used to test it, this order appears to be sufficient. However, I sometimes find a corner case not covered by the regular expressions or evaluation order given above, so I am interested in any feedback with suggested improvements or optimizations.

Post to Twitter Tweet This Post Post to Digg Digg This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Graphviz and the Visitor Pattern

November 16th, 2008

Graphviz is an excellent and free graph visualization program that provides a simple way to generate graphs for pretty much anything with a variety of layouts. The Graphviz layout programs take descriptions of graphs in a simple text language. These descriptions can be created and edited manually but are more usually generated programatically – especially for large data sets. 

This post provides an example of using the Visitor pattern to generate a directed graph visualization of a trie using the Graphviz dot layout manager. Full python source code listing of this example is available here.

A Trie class is used to represent the trie data structure. This class is composed of a set of nodes representing the trie nodes. These nodes are anchored via a root node. The code for adding data to the trie is omitted for clarity but is available in the full source listing

class Trie:
    def __init__(self):
        self.root = Node()

    def walk(self, visitor):
        self.root.walk(visitor)

From a visitor pattern perspective, the Node.walk() method is used to traverse the nodes and to pass each node encountered to the visitor object.

class Node:
    def walk(self, visitor):
        visitor.visitNode(self)
        for nextNode in self.children.values():
            nextNode.walk(visitor)

The GraphvizVisitor class is a Visitor subclass that generates the appropriate dot language representation for each visited node.

# Visitor interface
class Visitor:
    def visitNode(self, node):
        pass

# Visitor subclass that generates Graphviz dot file contents.
class GraphvizVisitor:
    def __init__(self, dotGen):
        self.dotGen = dotGen

    def visitNode(self, node):
        if node.isOutputNode:
            dotGen.setNodeProperty(node.name, 'shape', 'doublecircle')
        for edge, nextNode in node.children.items():
            dotGen.addLink(node.name, nextNode.name, edge)

The dotGen object is an instance of the DotFileGenerator class which is used as a simple utility class to generate the dot file.

class DotFileGenerator:
    def __init__(self, name):
        self.name = name
        self.outf = open(name+'.dot', 'w')
        print >> self.outf, 'digraph %s {' % (self.name)
        print >> self.outf, 'trankdir="LR";'
        print >> self.outf, 'tnode [shape = "circle"];'

    def setNodeProperty(self, nodeName, property, value):
        print >> self.outf, 't%s [%s = %s];' %(nodeName, property, value)

    def addLink(self, lhs, rhs, label):
        print >> self.outf, 't%s -> %s [label = %s];' % (lhs, rhs, label)

    def close(self):
        self.outf.close()

Consider the following example to illustrate the generated output:

# Build a trie
trie = Trie()
trie.addPattern("abc")
trie.addPattern("abcdef")
trie.addPattern("abcxyz")
trie.addPattern("abdef")

# Generate the dot file contents by walking the trie
dotGen = DotFileGenerator('trie')
visitor = GraphvizVisitor(dotGen)
trie.walk(visitor)
dotGen.close()

This produces the following dot file contents:

digraph trie {
	rankdir="LR";
	node [shape = "circle"];
	0 -> 1 [label = a];
	1 -> 2 [label = b];
	2 -> 3 [label = c];
	2 -> 10 [label = d];
	3 [shape = doublecircle];
	3 -> 7 [label = x];
	3 -> 4 [label = d];
	7 -> 8 [label = y];
	8 -> 9 [label = z];
	9 [shape = doublecircle];
	4 -> 5 [label = e];
	5 -> 6 [label = f];
	6 [shape = doublecircle];
	10 -> 11 [label = e];
	11 -> 12 [label = f];
	12 [shape = doublecircle];
}

After passing this dot file through Graphviz, we get:

Post to Twitter Tweet This Post Post to Digg Digg This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post