Taking advantage of multithreaded environments with Ruby
If you have ever tried to write parallel scripts / applications using Ruby’s threads, you may have stumbled over the fact that, sometimes, some cores are left unused. You may be especially sad if you notice that your brand new war-machine, with 2 physical cores or more, is not fully exploited.
Then, you would be right to wonder if using Ruby and taking advantage of multicore (i.e. writing multithreaded applications) are compatible.
The simple answer is yes. Let’s dig into the more complicated answers: we’ll first explain how Ruby’s threads work, then explain why they sometimes seem to be broken, and eventually suggest four efficient solutions to get rid of their limitations.
Ruby threading models
As explained in my previous article about Ruby’s speed, the 1.8.x and 1.9.x branches of Ruby are coexisting since 2008. One of the most important performance tweaks between the two mainstream interpreters1 is related to the threading model itself.
Prior to the 1.9.x branch, Ruby, with MRI, was using green (user-level) threads. With this threading model, one can map only one application per kernel-level scheduled entity (simply put, an application can use only one kernel thread); thus, the kernel does not even know if your application is multithreaded or not. This approach has several advantages, such as:
- providing threading on environments which do not support threading (it is handled by the application itself);
- reducing the cost of context-switching (switching between threads).
The choice of using green threads was made for Ruby 14 years ago, for compatibility reasons, when a lot of systems did not provide threading support.
But the major drawback of this threading model is that it cannot benefit from multi-core processors. This also means that threads may block each other during I/O operations, because they share the same execution context.
With the release of YARV, Ruby 1.9.x now uses kernel threads; this means that every single Ruby thread is mapped to a kernel thread and can be run on a different core, and thus enables a single Ruby script to use more than one core.
The Global Interpreter Lock (GIL)
The latter model sounds perfect and allows scripts to run more efficiently in a lot of concurrency situations.
But, everything is not perfect in the shiny kernel threads’ world: the Python and Ruby lovers among you may have already heard about the Global Interpreter Lock, a huge mutual exclusion lock (aka mutex), preventing multiple Ruby threads to use the interpreter at the same time.
Two Ruby threads may then not run pure Ruby code (more precisely, use its API) at the same time. This problem is illustrated by running the following code:
2.times() do Thread.new() do a = Array.new() a << 1 a.clear() end end
This code starts two threads which will perform an array instantiation, append the “1” value to it, and then empty it. Let’s schedule this by hand, in a pure didactic and fictitious way; red (resp. green) blocks represent periods of inactivity (resp. activity):
What is striking in this figure is the forced inactivity of the B thread when the A thread is running and vice-versa.
The problem is different when some pure C code is involved. As stated in the previous article, Ruby features easy interactions with C. Then we may simply add a call to some random C code (provided that it explicitly releases the GIL, and then doesn’t use the Ruby API), right after the array instantiation, and the figure becomes:
Much more interesting! This illustrates clearly that the problem only occurs when the Ruby API is involved.
Getting rid of the Global Interpreter Lock (GIL)
There are four main solutions to the GIL restriction (two of them are the subject of in-depth articles, which will be linked directly below):
- Run your script on a different VM: for example, JRuby and Rubinius have gotten rid of the GIL, so you might want to try them out.
- Wait for a new mainstream VM: Koichi Sasada (the creator of YARV) announced that he may be working on a multi-VM version of the interpreter, in which the GIL would no longer exist… we’ll stay tuned about that!
- Write long and blocking parts in C: extensions can run in parallel, given this simple constraint: you must manually ask the Ruby VM to release the GIL for the current thread. But after doing so, you shall not call, directly or not, any function of Ruby’s API, since this would lead to an undefined behavior. Some gems take advantage of this solution, such as mysql2, which allows to execute Ruby code while waiting for the result of a MySQL request (the mysql gem does not allow that).
- Use separate processes: if you want to get rid of the GIL, run multiple interpreters (quite straightforward)! Ruby ships with a good IPC library named DRb, which makes tasks distribution really easy.
This article tried to depict what you can expect from Ruby in terms of parallelism. It was mainly aimed at convincing you that green threads in Ruby 1.8, or the GIL in Ruby, are no show-stoppers, even if they require a little adaptation. If you’re curious about the last two solutions mentioned above, you are strongly encouraged to stay tuned for the in-depth articles:
- We won’t cover other interpreters’ implementation in this article.