Multi-thread scaling issues with Python and Ruby

With the advent of multi-core processors, CPU bound applications need to use multi-threading in order to be able to scale their performance beyond that offered by a single core. This provides many challenges, but an interesting aspect of this problem is to consider how the threading modules in modern programming languages such as Python and Rubycan either help or hinder this scalability. Yes, there are plenty of other programming languages in use today, but Python and especially Ruby are rapidly rising in popularity and there are some surprising limitations to be aware of when using their threading packages.

The standard C implementation of Ruby 1.x (current version: 1.9) implements threading as green threading, where all threads are serviced by a single OS level thread and the Ruby runtime has full control over the thread life cycle. As described in the Ruby wiki, Ruby’s thread scheduler is a simple cooperative timeslicing scheduler with control switching to another thread if certain well defined keywords or events are encountered. There is also a 10ms timeout period to prevent too many context switches occurring (i.e. in general a max of 1 context switch every 10ms).

This use of green threads imposes severe scaling restrictions for Ruby applications that are CPU bound since the use of a single native OS thread limits the Ruby application to run on a single CPU core. IO bound Ruby applications can employ threading to a certain extent to parallelize waiting on IO operations but even this is limited by the 10ms minimum context switch time which has the effect of limiting the number of threads that can run within a Ruby application. Due to this limitation, scalability of Ruby applications appears to be solved today by splitting the application and running it in multiple processes which can then be run on different cores.

There is some hope in store though in that using native OS threads instead of green threads is being considered for Ruby 2.0 and there are some implementations of Ruby such as JRubywhich currently implement Ruby threads using native OS threads (via Java though for JRuby).

In contrast to Ruby, Python threads are implemented using native OS threads and so it is possible for different Python threads within a single application to run on different cores on a multi-core processor under the control of the OS scheduler. However, Python threading has a serious limitation in the form of the Global Interpreter Lock(GIL). This is a global lock that must be held by the current thread before it can safely access Python objects and only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions. In order to support multi-threaded python programs, the interpreter regularly releases and reacquires the lock – by default, every 100 bytecode instructions. C extensions can release and reacquire the lock using the Python API and so this offers some relief, but the lock must be acquired before the state of any Python object is accessed.

Similar to Ruby, this GIL effectively limits the performance of CPU bound Python applications to that of a single CPU core (since only one Python thread can run at a time). Scalability is available for IO bound applications as these can easily scale across cores and the “one at a time” model of the GIL does not significantly restrict the performance of threads that are highly IO bound. Some relief is available by being able to implement performance and lock optimized C extensions but this is very restrictive and cumbersome – certainly a lot harder than writing some Python code.

Given this serious restriction of the Python threading model, you would expect it to be possible to replace the GIL with more fine grained locking, but apparently it has been tried and there are some reasons why we can’t get rid of the global interpreter lock. When fine grained locking was tried as a patch to Python 1.5, a 2x slowdown was observed. The slowdown was attributed to the overhead of the acquiring/releasing the OS locks. This patch hasn’t been maintained for subsequent versions of Python. Another patch that is gaining popularity and actively being maintained is python-safethread. This is a set of Python extensions that is “intended to provide safe, easy, and scalable concurrency mechanisms. It focuses on local concurrency, not distributed or parallel programs.” While it is not yet part of the Python mainline but it is certainly a promising solution to the GIL issue.

Update: Thanks to Adam Olsen for pointing me towards python-safethread as a possible solution to the GIL.

Regular Expressions in C++

Up to now, regular expression support in C/C++ programs was achieved using third party or open source regular expression libraries such as the PCRE library. With the addition of regex support to the C++ standard library as part of the C++0x standard update, using regular expressions in C++ programs has become much simpler. This feature is included in the TR1 draft reportwhich has already been implemented in some popular compilers such as gcc and Visual Studio 2008 (as part of service pack 1).  

Six regular expression grammars will be supported in C++0x. The default is based upon the ECMAScript grammar specified in ECMA-262. This syntax is based upon the PCRE syntax and is used by languages such as Perl, Python and Ruby which also provide built in regular expression support. Other supported grammars include the POSIX regex syntax, and the syntaxes used in tools such as awk, grep and egrep.

Here are some examples that illustrate how to perform some basic tasks with the new C++ regex component of the standard library.

Header files and namespaces:

#include <regex>

using namespace std::tr1;

Finding a match:

regex rgx("ello");
assert(regex_search("Hello World", rgx));

The above example illustrates the construction of a regex object, with the regex pattern being passed as a parameter to the regex constructor. The regex object is a specialization of the basic_regex template for working with regular expressions which are provided using sequences of chars. The regex_search()function template is then used to see if the “Hello world” string contains the “ello” pattern. This function returns true as soon as the first matching substring is found. The regex_search()function is also overloaded to provide versions that take sequence iterators as params (instead of a full string) and also versions that provide additional info on the match results.

Note: The use of assert() in the examples is used to highlight the “contract” provided by the api – e.g. to highlight if a function can be used in a conditional expression and if the function should return true or false for the particular example.

Finding an exact match:
The regex_match() function template is an alternative to regex_search() and is used when the target sequence must exactly match the regular expression.

regex rgx("ello");
assert(regex_match("Hello World", rgx) == false);
assert(regex_match("ello", rgx));

Finding the position of a match:
The sub_match or match_results template is used to receive search results from regex_search(). When searching char data, the library provides a ready made specialization of match_results called cmatch.

regex rgx("llo");
cmatch result;
regex_search("Hello World", result, rgx);
cout << "Matched \"" << result.str()
    << "\" after \"" << result.prefix()
    << "\" at offset: " << result.position()
    << " with length: " << result.length()
    << endl;

Working with capture groups:
Capture groups provide a means for capturing matched regions within a regular expression. Each captured region is represented by a sub_match template object. The smatch specialization of match_results is provided by the library for working with sequences of string sub-matches.

string seq = "";
regex rgx("(.*)@(.*)");
smatch result;
regex_search(seq, result, rgx);
for(size_t i=0; i<result.size(); ++i)
    cout << result[i] << endl;

Case insensitive searches:

regex rgx("ello", regex_constants::icase);
assert(regex_search("HELLO WORLD", rgx));