Cryptography with Ruby

When looking at cryptographic support in Ruby I was pleased to see the breadth of cryptographic features provided by the openssl module. However, I was also surprised to see that performing simple operations such as encrypting or hashing a string required a number of interactions with the openssl module. Furthermore, using the API correctly and securely to perform these simple operations requires some care.

The Gibberish gem provides a nice wrapper around openssl and presents a very clean and simple API for performing simple operations such as AES encryption/decryption, digest generation and HMAC generation.

For example, encrypting some text with Gibberish:

# AES encryption (defaults to AES-256-CBC
cipher = Gibberish::AES.new("p4ssw0rd")
cipher.enc("Some top secret data")

Compared to the same operation with openssl:

c = OpenSSL::Cipher::Cipher.new("aes-256-cbc")
c.encrypt
salt = generate_salt
c.pkcs5_keyivgen("p4ssw0rd", salt, 1)
e = c.update("Some top secret data") + c.final
e = "Salted__#{salt}#{e}" #OpenSSL compatible

Some of the problems related to correctly using the openssl library has arisen from poor documentation in earlier versions of Ruby. Eric Hodel has put a lot of effort into improving this documentation in the latest versions of Ruby.

Further searching for Ruby cryptographic libraries yields a pure ruby implementation called crypt and another wrapper around openssl called ezcrypto. Crypt performance is lower than openssl due to the pure ruby implementation and the last update was a good while ago. Ezcrypto also seems functional but I prefer the style of the Gibberish API.

In summary, Gibberish seems to be the best option for common cryptographic operations with openssl as the fallback for any features not exposed through the Gibberish wrapper.

Passing options in Ruby

A neat idiom for passing options to a class in Ruby …

class WithOptions
  def initialize(opts = {})
    @options = {
      :debug => false
    }.merge(opts)
  end
end
foo = WithOptions.new(:debug => true)

# or ...

other_opts = {
  :debug => true,
  :debug_level => :WARN
}
foo = WithOptions.new(other_opts)

Any options passed will be merged with the default options setup in the class. Options with the same name will override the default options. The same technique can be used with methods.

An alternative approach is to specify the defaults directly in the default options hash parameter.

def some_method(opts = {:debug => false})
  puts opts[:debug]
end

7 Deadly Sins of Design

Reproduced from a paper on object oriented principles by Arnon Rotem-Gal-Oz.

7 deadly design sins to avoid:

  1. Rigidity. Make it hard to change, especially if changes might result in ripple effects or when you don’t know what will happen when you make changes.
  2. Fragility. Make it easy to break. Whenever you change something, something breaks.
  3. Immobility. Make it hard to reuse. When something is coupled to everything it uses. When you try to take a piece of code (class etc.) it takes all of its dependencies with it.
  4. Viscosity. Make it hard to do the right thing. There are usually several ways to work with a design. Viscosity happens when it is hard to work with the design the way the designer intended to. The results are tricks and workarounds that, many times, have unexpected outcomes (esp. if the design is also fragile).
  5. Needless Complexity. Over design. When you overdo it; e.g. the “Swiss-Army knife” antipattern. A class that tries to anticipate every possible need. Another example is applying too many patterns to a simple problem etc.
  6. Needless Repetition. The same code is scattered about which makes it error prone.
  7. Not doing any design.

Post-Increment vs Pre-Increment

There is a well known C++ BKM for preferring pre-increment (++i) over post-increment (i++). The rationale is that when post-incrementing and object, the object must increment itself and then return a temporary containing its old value. While this makes perfect sense for complex types (e.g. STL iterators) I’ve always wondered if it holds true for built-in types such as int. Let’s find out.

First let’s create a trivial example to examine:

#include <iostream>

using namespace std;

int main()
{
	// post-imcrement
	for(int i=0; i<10; i++)
	{
		cout << "[" << i << "]" << endl;
	}

	// pre-imcrement
	for(int i=0; i<10; ++i)
	{
		cout << "[" << i << "]" << endl;
	}

	return 0;
}

Now using the Visual Studio 2008 C++ compiler, let’s take a look at the generated assembly code. First let’s keep things simple and disable compiler optimizations (/Od):

	// post-imcrement
	for(int i=0; i<10; i++)
004114D3  mov         dword ptr [i],0 
004114D5  jmp         main+30h (4114E0h) 
004114D7  mov         eax,dword ptr [i] 
004114DA  add         eax,1 
004114DD  mov         dword ptr [i],eax 
004114E0  cmp         dword ptr [i],0Ah 
004114E4  jge         main+86h (411536h) 
	{

<... snip ...>

	// pre-imcrement
	for(int i=0; i<10; ++i)
00411536  mov         dword ptr [i],0 
0041153D  jmp         main+98h (411548h) 
0041153F  mov         eax,dword ptr [i] 
00411542  add         eax,1 
00411545  mov         dword ptr [i],eax 
00411548  cmp         dword ptr [i],0Ah 
0041154C  jge         main+0EEh (41159Eh) 
	{

The relevant parts of the assembly code are highlighted. As you can see, there is no difference between the generated code for post-increment and pre-increment.

Now, with compiler optimizations enabled (\O2) and the relevant parts highlighted:

	// post-imcrement
	for(int i=0; i<10; i++)
00401008  xor         esi,esi 
0040100A  lea         ebx,[ebx] 
	{
<... snip ...>
0040104A  inc         esi  
0040104B  cmp         esi,0Ah 
0040104E  jl          main+10h (401010h) 
	}

	// pre-imcrement
	for(int i=0; i<10; ++i)
00401050  xor         esi,esi 
	{
<... snip ...>
0040108C  inc         esi  
0040108D  cmp         esi,0Ah 
00401090  jl          main+52h (401052h) 
	}

Again there is no difference in the generated assembly between post-increment and pre-increment.

So, based on this quick experiment, there is no difference between pre-increment and post-increment for int types when using the Visual Studio 2008 C++ compiler with optimizations enabled or disabled.

Note that this doesn’t change the BKM to prefer pre-increments over post-increments when using integer types. The above experiment was only conducted with a single compiler – other compilers may behave differently and so it is best to err on the side of caution and use pre-increments where possible. Also, the use of pre-increment adds some future proofing – if the type of the variable being incremented changed to a more complex type then pre-increment then becomes the optimal choice.

countloc v0.4 released

In the absence of independent user feedback, a good way to approximate independent feedback is to write something, not look at it for a while and then try to read/use it. I was reminded of this recently when I tried to use my countloc app to gather some project LOC metrics. I had written this particular version about a year ago and hadn’t used it in a while. After dusting off the mothballs and invoking the help, it took me two attempts to get it to work properly. I was not impressed. I was also worried that the hundred or so other people that downloaded it had a similar experience of finding that it was not as usable as it should be.

I’ve hopefully fixed the usability problems in the latest version – countloc v0.4. I’ve also fixed a few bugs and added in some feature requests that I had received via the RubyForge page.

If you do give this app a twirl please provide feedback via the RubyForge area on any issues encountered.

Note: The app is implemented in Ruby and the download also contains the source code. I would be interested in hearing from any Ruby gurus out there on any suggestions for improving the code. I’ve set up a thread in the forum area for providing feedback.

CodeKata – Kata Eighteen Solution

Kata Eighteen requires some code to calculate transitive dependencies, i.e. how dependencies propagate between things. Here is my solution.

class Dependencies

def initialize()
@direct = Hash.new
end

def add_direct(item, dependants)
@direct[item] = dependants
end

def dependencies_for(item)
dependants = []
toBeProcessed = @direct[item].clone
while not toBeProcessed.empty?
x = toBeProcessed.shift
if x != item and not dependants.include?(x)
dependants << x toBeProcessed.concat(@direct[x]) if @direct.include?(x) end end return dependants.sort! end end [/sourcecode]

CodeKata – Kata Five Solution

Kata five requires the implementation of a Bloom filter. Here is my solution in Ruby using a non-optimized bitmap implementation and a reduced MD5 digest for the hashing functions.

class BloomFilter

def initialize(dictionary, num_bits, num_hashes)
@bitmap = Bitmap.new(num_bits)
@hashes = Array.new(num_hashes) { |i| BloomHash.new(i, num_bits-1) }

# compute the hash values for each word in the dictionary
# and set the appropriate bits in the bitmap
File.new(dictionary, ‘r’).each_line do |word|
@hashes.each { |h| @bitmap[h.compute(word.chomp)] = 1 }
end
end

def search(word)
@hashes.each { |h| return false if @bitmap[h.compute(word)] == 0 }
return true
end

end

class BloomHash

def initialize(seed, max_value)
@offset = seed
@num_digits = 0
@max_value = max_value
while(max_value > 0) do
@num_digits += 1
max_value >>= 4
end
end

def compute(word)
digest = Digest::MD5.hexdigest(word)
return digest[@offset, @num_digits].to_i(16) % @max_value
end

end

class Bitmap < Array def initialize(size) super(size) self.fill(0) end end [/sourcecode]

Multi-thread scaling issues with Ruby [update]

In my previous post on Multi-thread scaling issues with Python and Ruby, I had described the threading implementation within Ruby as using green threads. It seems that this has changed in Ruby 1.9 …

Ruby 1.9 threads do map to native threads, unfortunately the 1.9 interpreter forces user created threads to acquire a global mutex lock before executing. The upshot is that 1.9 thread execution is serialized and unable to benefit from multiple cores.

See Ruby in a multicore world for further discussion.

CodeKata – Kata Two Solution

Dave Thomas maintains a blog of “Code Kata’s”. He describes a code kata as being a practice session for practicing coding skills:

Each is a short exercise (perhaps 30 minutes to an hour long). Some involve programming, and can be coded in many different ways. Some are open ended, and involve thinking about the issues behind programming. These are unlikely to have a single correct answer.

Here are two of my solutions for kata two using Ruby. The first uses recursion and the second uses an iterative approach.

Recursive solution …

# Recursive implementation
def chop(target, values)
  # Special handling for zero and single element arrays
  return -1 if values.empty?

  return ((target == values[0]) ? 0 : -1) if values.length == 1

  # Try the bottom half first
  pos = chop(target, values[0, values.length/2])
  return pos if pos != -1

  # Then the upper half ... remember that the returned 
  # position is relative to the middle of the array.
  pos = chop(target, values[values.length/2, values.length-1])
  return pos + (values.length/2) if pos != -1

  # Didn't find what we were looking for
  return -1
end

Iterative solution …

# Iterative implementation
def chop(target, values)
start = 0
stop = values.length # stop indexes one past the end of the array

# loop until something is found or until we
# run out of elements to search
while start != stop
# Find the middle
mid = start + (stop – start) / 2

# Check to see if we got lucky
return mid if values[mid] == target

# Not this time … so lets see which half it might be in
if target < values[pos] stop = mid else start = mid + 1 end end # Didn't find what we were looking for return -1 end [/sourcecode]