Multi-thread scaling issues with Python and Ruby

With the advent of multi-core processors, CPU bound applications need to use multi-threading in order to be able to scale their performance beyond that offered by a single core. This provides many challenges, but an interesting aspect of this problem is to consider how the threading modules in modern programming languages such as Python and Rubycan either help or hinder this scalability. Yes, there are plenty of other programming languages in use today, but Python and especially Ruby are rapidly rising in popularity and there are some surprising limitations to be aware of when using their threading packages.

RUBY
The standard C implementation of Ruby 1.x (current version: 1.9) implements threading as green threading, where all threads are serviced by a single OS level thread and the Ruby runtime has full control over the thread life cycle. As described in the Ruby wiki, Ruby’s thread scheduler is a simple cooperative timeslicing scheduler with control switching to another thread if certain well defined keywords or events are encountered. There is also a 10ms timeout period to prevent too many context switches occurring (i.e. in general a max of 1 context switch every 10ms).

This use of green threads imposes severe scaling restrictions for Ruby applications that are CPU bound since the use of a single native OS thread limits the Ruby application to run on a single CPU core. IO bound Ruby applications can employ threading to a certain extent to parallelize waiting on IO operations but even this is limited by the 10ms minimum context switch time which has the effect of limiting the number of threads that can run within a Ruby application. Due to this limitation, scalability of Ruby applications appears to be solved today by splitting the application and running it in multiple processes which can then be run on different cores.

There is some hope in store though in that using native OS threads instead of green threads is being considered for Ruby 2.0 and there are some implementations of Ruby such as JRubywhich currently implement Ruby threads using native OS threads (via Java though for JRuby).

PYTHON
In contrast to Ruby, Python threads are implemented using native OS threads and so it is possible for different Python threads within a single application to run on different cores on a multi-core processor under the control of the OS scheduler. However, Python threading has a serious limitation in the form of the Global Interpreter Lock(GIL). This is a global lock that must be held by the current thread before it can safely access Python objects and only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions. In order to support multi-threaded python programs, the interpreter regularly releases and reacquires the lock – by default, every 100 bytecode instructions. C extensions can release and reacquire the lock using the Python API and so this offers some relief, but the lock must be acquired before the state of any Python object is accessed.

Similar to Ruby, this GIL effectively limits the performance of CPU bound Python applications to that of a single CPU core (since only one Python thread can run at a time). Scalability is available for IO bound applications as these can easily scale across cores and the “one at a time” model of the GIL does not significantly restrict the performance of threads that are highly IO bound. Some relief is available by being able to implement performance and lock optimized C extensions but this is very restrictive and cumbersome – certainly a lot harder than writing some Python code.

Given this serious restriction of the Python threading model, you would expect it to be possible to replace the GIL with more fine grained locking, but apparently it has been tried and there are some reasons why we can’t get rid of the global interpreter lock. When fine grained locking was tried as a patch to Python 1.5, a 2x slowdown was observed. The slowdown was attributed to the overhead of the acquiring/releasing the OS locks. This patch hasn’t been maintained for subsequent versions of Python. Another patch that is gaining popularity and actively being maintained is python-safethread. This is a set of Python extensions that is “intended to provide safe, easy, and scalable concurrency mechanisms. It focuses on local concurrency, not distributed or parallel programs.” While it is not yet part of the Python mainline but it is certainly a promising solution to the GIL issue.

Update: Thanks to Adam Olsen for pointing me towards python-safethread as a possible solution to the GIL.

Tags: , ,

10 Responses to “Multi-thread scaling issues with Python and Ruby”

  1. Suren says:

    Have provided a link to this post on http://www.multicoreinfo.com , a hub for multicore related news and resources.

  2. Adam Olsen says:

    Take a look at https://launchpad.net/python-safethread . The GIL is gone, and although there’s still a significant cost, the resulting performance is scalable (minimizing contention between threads.)

    Of course the Multicore Crisis is all about ease of programming, not simply performance, and that’s the real target of python-safethread. GIL removal is just a bonus.

  3. Steve Doyle says:

    Thanks – I didn’t know about the safethread extension. I’ll update the post to point to it. Do you know if thre are any plans to incorporate this into the Python mainline?

  4. Adam Olsen says:

    Nothing official. I definitely want to see it merged, but it’s a pretty massive change, and won’t go in quietly.

    The project is maturing though, and I’m reaching the point where it starts becoming usable on its own. Community support could play a large role into getting it accepted upstream.

  5. Paddy3118 says:

    You write:

    With the advent of multi-core processors, CPU bound applications need to use multi-threading in order to be able to scale their performance beyond that offered by a single core.

    You don’t mention multi-process parallelism?

    - Paddy.

  6. Multi-thread scaling issues with Python and Ruby…

    [...]With the advent of multi-core processors, CPU bound applications need to use multi-threading in order to be able to scale their performance beyond that offered by a single core. This provides many challenges, but an interesting aspect of this prob…

  7. Tom Willis says:

    Someone better tell youtube.com that python can’t scale before their site gets too big, i gotta have my videos.

  8. Yeah with Ruby even 1.9 you’re still locked to a single core. Only way to overcome this currently is running multiple processes side by side. Until…who knows when. I guess there are some other languages that overcome this, but not Ruby, yet.

  9. Jesse says:

    In addition to safethread, there’s the work going on in 2.6/3.0 courtesy of pep-371 (http://python.org/dev/peps/pep-0371/) to add in a threading-like interface to process-based threading. Adam’s right though – “Of course the Multicore Crisis is all about ease of programming, not simply performance”.

    Threads, as they sit today in most languages are a pain to get right no matter what.