Archive for the ‘C++’ Category

Regular Expressions in C++

Saturday, July 12th, 2008

Up to now, regular expression support in C/C++ programs was achieved using third party or open source regular expression libraries such as the PCRE library. With the addition of regex support to the C++ standard library as part of the C++0x standard update, using regular expressions in C++ programs has become much simpler. This feature is included in the TR1 draft reportwhich has already been implemented in some popular compilers such as gcc and Visual Studio 2008 (as part of service pack 1).  

Six regular expression grammars will be supported in C++0x. The default is based upon the ECMAScript grammar specified in ECMA-262. This syntax is based upon the PCRE syntax and is used by languages such as Perl, Python and Ruby which also provide built in regular expression support. Other supported grammars include the POSIX regex syntax, and the syntaxes used in tools such as awk, grep and egrep.

Here are some examples that illustrate how to perform some basic tasks with the new C++ regex component of the standard library.

Header files and namespaces:

#include <regex>

using namespace std::tr1;

Finding a match:

regex rgx("ello");
assert(regex_search("Hello World", rgx));

The above example illustrates the construction of a regex object, with the regex pattern being passed as a parameter to the regex constructor. The regex object is a specialization of the basic_regex template for working with regular expressions which are provided using sequences of chars. The regex_search()function template is then used to see if the “Hello world” string contains the “ello” pattern. This function returns true as soon as the first matching substring is found. The regex_search()function is also overloaded to provide versions that take sequence iterators as params (instead of a full string) and also versions that provide additional info on the match results.

Note: The use of assert() in the examples is used to highlight the “contract” provided by the api - e.g. to highlight if a function can be used in a conditional expression and if the function should return true or false for the particular example.

Finding an exact match:
The regex_match() function template is an alternative to regex_search() and is used when the target sequence must exactly match the regular expression.

regex rgx("ello");
assert(regex_match("Hello World", rgx) == false);
assert(regex_match("ello", rgx));

Finding the position of a match:
The sub_match or match_results template is used to receive search results from regex_search(). When searching char data, the library provides a ready made specialization of match_results called cmatch.

regex rgx("llo");
cmatch result;
regex_search("Hello World", result, rgx);
cout << "Matched \"" << result.str()
    << "\" after \"" << result.prefix()
    << "\" at offset: " << result.position()
    << " with length: " << result.length()
    << endl;

Working with capture groups:
Capture groups provide a means for capturing matched regions within a regular expression. Each captured region is represented by a sub_match template object. The smatch specialization of match_results is provided by the library for working with sequences of string sub-matches.

string seq = "foo@helloworld.com";
regex rgx("(.*)@(.*)");
smatch result;
regex_search(seq, result, rgx);
for(size_t i=0; i<result.size(); ++i)
{
cout << result[i] << endl;
}

Case insensitive searches:

regex rgx("ello", regex_constants::icase);
assert(regex_search("HELLO WORLD", rgx));

Demystifying the volatile keyword

Tuesday, May 13th, 2008

Following on from my earlier post about the restrict keyword, I’d like to try and dispell the myth around the volatile keyword. The meaning of volatile is a popular interview question, particularly for embedded development jobs and I’ve heard some programmers describe the properties of this keyword as if it had super powers.

The volatile keyword does a very simple job. When a variable is marked as volatile, the programmer is instructing the compiler not to cache this variable in a register but instead to read the value of the variable from memory each and every time the variable is used. That’s it - simple isn’t it?

To illustrate the use of the keyword, consider the following example:

volatile int* vp = SOME_REGISTER_ADDRESS;
for(int i=0; i<100; i++)
    foo(*vp);

In this simple example, the pointer vp points to a volatile int. The value of this int is read from memory for each loop iteration. If volatile was not specified then it is likely that the compiler would generate optimized code which would read the value of the int once, temporarily store this in a register and then use the register copy during each iteration.

Examples of where volatile is often used:

  • When accessing hardware registers via pointers. It is necessary for the generated code to always access the hardware registers value and never a potentially out of date copy.
  • When accessing a variable that is shared between two or more threads or between a thread and an ISR.

Common myths about the volatile keyword:

  • A volatile variable will never reside in cache memory - e.g. within the L2 cache of a processor. This is not true. In the case where the volatile variable is shared between two software threads, it is highly likely that the variable will exist in cached memory and the cache coherency policy will ensure that the threads will see the correct and up-to-date value of the variable in the event that the threads are running on seperate cores that don’t share the same cache. When the volatile variable refers to a hardware register, it is likely that the memory map of the system will be setup such that this register is not cacheable in the processor cache, but this is setup and managed by the appropriate device driver and is not something that is provided by the volatile keyword.

Thats’ it. So the next time you hear somebody waxing lyrical about the super powers of the volatile keyword, please feel empowered to enlighten them!

C99 restrict Keyword

Tuesday, May 13th, 2008

I was pleasently surprised to encounter a C keyword that I had never heard of before. The keyword in question is the restrict keyword which is a type qualifier for pointers and is a formal part of the C99 standard. This keyword allows programmers to declare that pointers which share the same type do not alias each other.  This information can then be used by the compiler to make optimizations when using the pointers. If the data is in fact aliased, the results are undefined.

Consider the following example:

memcpy((void* restrict) dst, (void* restrict) src, size);

 

This tells the compiler that neither the dst or src pointer paramters overlap and so the compiler is free to apply any optimizations - including optimizations that may result in out of order reads/writes.

Mainstream compilers have varying support for this feature.

  • GCC supports it in C99 mode  - specified via the “-std=c99″ option or for non-C99 code by specifying __restrict to enable the keyword as a GCC extension.
  • Microsoft’s Visual Studio .NET 2005/2008 compiler doesn’t support this feature as specified in the C99 standard but does provide similar support using the __restrict specifier. Micorosft also allows this keyword to be specified for both C and C++ code. See the MSDN documentation for more details on Microsofts implementation and differences between it’s support and the C99 specification of restrict.

Finally, it should be noted that this keyword is specific to C and is not specified in the 1998 C++ specification nor is it currently planned for inclusion in the fothcoming C++ specification update.

References:

Thread Affinity on OS X

Tuesday, April 29th, 2008

My first experience in developing software to run on OS X has been a disappointing one. Previously, I have written software to run on Windows and Linux and in general I have found library or system API calls that provide me with the ability to access the services that I expect to be provided by the operating system. One such service is thread affinity - i.e. the ability to tie a particular thread to a given core. This is achieved using SetThreadAffinityMask() on Windows and sched_setaffinity() on Linux.

However, I was very surprised to find that it does not appear to be possible to do this on OS X! 

OS X Leopard introduced a thread affinity API which provides the ability to provide affinity hints to the scheduler to improve data locality in caches. Unfortunately this API does not provide the ability to tie a thread to a core. This is a major gap, especially now that all new PCs and laptops are multi-core.

So why do I want to be able to tie a thread to a core? I want to benchmark a piece of code and one of the data points of interest is to benchmark this code on a single core. With the current Mac OS X thread affinity API, this is not possible!

I’m finding it hard to believe that this service does not exist on OS X - I mean surely somebody must have wanted to run a single core benchmark on OS X running on a multi-core system. But after a couple of hours of googling and reading through the OS X APIs I have been unable to find out how to do this.

Visual C++ 2008 Feature Pack Released

Wednesday, April 9th, 2008

Previous posts have mentioned the Visual C++ 2008 feature pack which was available as a beta release. The full version has been released and is available for download

The previously reported bug with array::max_size() has now been fixed.

Some TR1 related resources from Microsoft:

C++ Lambda Functions

Sunday, April 6th, 2008

A previous post showed a code snippet for dumping the contents of an STL container using an ostream_iterator. Things have now gotten a little bit easier with the introduction of lambda functions into the upcoming C++0x standard. Lambda functions allow pieces of code to be passed around as if they were ordinary objects. Applying lambda functions for dumping the contents of an STL container gives:

vector<int> v(10);
generate(v.begin(), v.end(), rand);
for_each(v.begin(), v.end(),
    [](int& x){ cout << x << " "; })

OK, so the syntax looks a bit strange at first, but it does add a powerful construct to the language.

Lambdas have only just been added to the C++0x standard and support for lambdas are not included in the beta version of the Visual Studio 2008 TR1 feature pack.

For more details and examples of lambda functions, check out Herb Sutter’s recent trip report from the February/March ISO C++ standards meeting.

C++ TR1: stdint.h still missing from Visual Studio

Wednesday, March 19th, 2008

stdint.h is a disappointing absence from both Visual Studio 2008 and the TR1 feature pack. This header was introduced in the C99 standard library to allow programmers to write more portable code by providing a set of typedefs that specify exact-width integer types, together with the defined minimum and maximum allowable values for each type. This standardized the approach for writing such things as uint32_t and should save countless hours of needless duplication for projects that like to define their own variants such as uint32, UINT32, u32, Xyz32U, etc.

Since the C++ standard was finalized in 1998, it missed this standard header by a year. In the latest update to the C++ standard (namely those extensions covered in TR1), support for this header has been added in the form of the cstdint header.

It is astonishing to find that a header that was standardized 9 years ago has still not made its way into Visual Studio 2008. Not even the recent feature pack beta which included support for most of the TR1 extensions, contained stdint.h! The lack of support for this header was logged as a bug with Microsoft way back in 2005 but is still in the “postponed” bucket.

Thankfully, there are a number of implementations of stdint.h available, the most notable being Paul Hsieh’s cross-platform free implementation and also a Microsoft compiler specific implementation. Simply place one of these implementations into the Visual Studio standard include paths and stdint.h support magically appears … now why can’t Microsoft do something like that!

C++ TR1: array VS 2008 Bug

Saturday, March 15th, 2008

Microsoft have recently released a Beta version of the Visual Studio 2008 Feature Pack which includes support for most of the C++ standard library extensions described in TR1. Alas, the beta sticker is appropriate as there are still some bugs to be ironed out.

The TR1 array container template provides functionality for implementing a fixed size array. This is a halfway point between plain old C style arrays and C++ STL vectors. The defining property of the array container is that its size is fixed. Consider and example of an array of 4 integers

array<int, 4> = { 0, 1, 2, 3 };

Since it adheres to the STL container rules, it must implement methods such as size(), and max_size(). For the array container, both of these should return the size of the array, which is fixed. In the above example, both should return 4. However, when using the VS 2008 TR1 implementation of array, a bug appears. The code:

#include <iostream>
#include <array>using namespace std;
using namespace std::tr1;

int main()
{
    array<int, 4> arr = {1, 2, 3, 4};
    cout << "size: " << arr.size() << endl;
    cout << "max_size: " << arr.max_size() << endl;
    return 0;
}

Produces the output:

size: 4max_size: 1073741823

instead of:

size: 4max_size: 4

If we take a look at the implementation of max_size() we can see the problem

size_type max_size() const
{
    // return maximum possible length of sequence
    size_type _Count = (size_type)(-1) / sizeof(_Ty);
    return (0 < _Count ? _Count : 1);
}

Instead of simply retuning N (the size of the array), it performs the same computation as if this was a vector.

This issue has been logged as a bug with Microsoft and will hopefully be fixed before the “Gold” release of the feature pack.

 
 
 
 
 

 

 

C++ TR1

Saturday, March 15th, 2008

Effective C++, Third Edition summarizes TR1 this way:

TR1 (”Technical Report 1″) is a specification for new functionality being added to C++’s standard library. This functionality takes the form of new class and function templates for things like hash tables, reference-counting smart pointers, regular expressions, and more. TR1 itself is just a document.

The TR1 draft does not contain any background information on the functionality it provides and doesn’t contain any examples for how it should be used. For this sort of information refer to the proposal documents which were used to define the TR1 functionality. The relevant proposal documents are nicely catalogued by Scott Myers.

Microsoft have recently released a Beta version of the Visual Studio 2008 Feature Pack which includes support for most of the C++ standard library extensions described in TR1.

GCC v4.x also provides support for most of the extensions.

Over the next few weeks (more likely months) I plan on playing with the new TR1 features and I will add thoughts, learnings and code snippets to this blog.

Simple container dump using STL iterator

Saturday, July 14th, 2007

Quick and dirty printing of containers contents in C++ using STL ostream_iterator …

#include <iostream>
#include <algorithm>
#include <vector>

using namespace std;

int main()
{
    vector<int> v(10);
    generate(v.begin(), v.end(), rand);
   
    copy(v.begin(), v.end(),
        ostream_iterator<int>(cout, "n"));

    return 0;
}