Cryptography with Ruby

When looking at cryptographic support in Ruby I was pleased to see the breadth of cryptographic features provided by the openssl module. However, I was also surprised to see that performing simple operations such as encrypting or hashing a string required a number of interactions with the openssl module. Furthermore, using the API correctly and securely to perform these simple operations requires some care.

The Gibberish gem provides a nice wrapper around openssl and presents a very clean and simple API for performing simple operations such as AES encryption/decryption, digest generation and HMAC generation.

For example, encrypting some text with Gibberish:

# AES encryption (defaults to AES-256-CBC
cipher = Gibberish::AES.new("p4ssw0rd")
cipher.enc("Some top secret data")

Compared to the same operation with openssl:

c = OpenSSL::Cipher::Cipher.new("aes-256-cbc")
c.encrypt
salt = generate_salt
c.pkcs5_keyivgen("p4ssw0rd", salt, 1)
e = c.update("Some top secret data") + c.final
e = "Salted__#{salt}#{e}" #OpenSSL compatible

Some of the problems related to correctly using the openssl library has arisen from poor documentation in earlier versions of Ruby. Eric Hodel has put a lot of effort into improving this documentation in the latest versions of Ruby.

Further searching for Ruby cryptographic libraries yields a pure ruby implementation called crypt and another wrapper around openssl called ezcrypto. Crypt performance is lower than openssl due to the pure ruby implementation and the last update was a good while ago. Ezcrypto also seems functional but I prefer the style of the Gibberish API.

In summary, Gibberish seems to be the best option for common cryptographic operations with openssl as the fallback for any features not exposed through the Gibberish wrapper.

Passing options in Ruby

A neat idiom for passing options to a class in Ruby …

class WithOptions
  def initialize(opts = {})
    @options = {
      :debug => false
    }.merge(opts)
  end
end
foo = WithOptions.new(:debug => true)

# or ...

other_opts = {
  :debug => true,
  :debug_level => :WARN
}
foo = WithOptions.new(other_opts)

Any options passed will be merged with the default options setup in the class. Options with the same name will override the default options. The same technique can be used with methods.

An alternative approach is to specify the defaults directly in the default options hash parameter.

def some_method(opts = {:debug => false})
  puts opts[:debug]
end

countloc v0.4 released

In the absence of independent user feedback, a good way to approximate independent feedback is to write something, not look at it for a while and then try to read/use it. I was reminded of this recently when I tried to use my countloc app to gather some project LOC metrics. I had written this particular version about a year ago and hadn’t used it in a while. After dusting off the mothballs and invoking the help, it took me two attempts to get it to work properly. I was not impressed. I was also worried that the hundred or so other people that downloaded it had a similar experience of finding that it was not as usable as it should be.

I’ve hopefully fixed the usability problems in the latest version – countloc v0.4. I’ve also fixed a few bugs and added in some feature requests that I had received via the RubyForge page.

If you do give this app a twirl please provide feedback via the RubyForge area on any issues encountered.

Note: The app is implemented in Ruby and the download also contains the source code. I would be interested in hearing from any Ruby gurus out there on any suggestions for improving the code. I’ve set up a thread in the forum area for providing feedback.

CodeKata – Kata Eighteen Solution

Kata Eighteen requires some code to calculate transitive dependencies, i.e. how dependencies propagate between things. Here is my solution.

class Dependencies

def initialize()
@direct = Hash.new
end

def add_direct(item, dependants)
@direct[item] = dependants
end

def dependencies_for(item)
dependants = []
toBeProcessed = @direct[item].clone
while not toBeProcessed.empty?
x = toBeProcessed.shift
if x != item and not dependants.include?(x)
dependants << x toBeProcessed.concat(@direct[x]) if @direct.include?(x) end end return dependants.sort! end end [/sourcecode]

CodeKata – Kata Five Solution

Kata five requires the implementation of a Bloom filter. Here is my solution in Ruby using a non-optimized bitmap implementation and a reduced MD5 digest for the hashing functions.

class BloomFilter

def initialize(dictionary, num_bits, num_hashes)
@bitmap = Bitmap.new(num_bits)
@hashes = Array.new(num_hashes) { |i| BloomHash.new(i, num_bits-1) }

# compute the hash values for each word in the dictionary
# and set the appropriate bits in the bitmap
File.new(dictionary, ‘r’).each_line do |word|
@hashes.each { |h| @bitmap[h.compute(word.chomp)] = 1 }
end
end

def search(word)
@hashes.each { |h| return false if @bitmap[h.compute(word)] == 0 }
return true
end

end

class BloomHash

def initialize(seed, max_value)
@offset = seed
@num_digits = 0
@max_value = max_value
while(max_value > 0) do
@num_digits += 1
max_value >>= 4
end
end

def compute(word)
digest = Digest::MD5.hexdigest(word)
return digest[@offset, @num_digits].to_i(16) % @max_value
end

end

class Bitmap < Array def initialize(size) super(size) self.fill(0) end end [/sourcecode]

Multi-thread scaling issues with Ruby [update]

In my previous post on Multi-thread scaling issues with Python and Ruby, I had described the threading implementation within Ruby as using green threads. It seems that this has changed in Ruby 1.9 …

Ruby 1.9 threads do map to native threads, unfortunately the 1.9 interpreter forces user created threads to acquire a global mutex lock before executing. The upshot is that 1.9 thread execution is serialized and unable to benefit from multiple cores.

See Ruby in a multicore world for further discussion.

CodeKata – Kata Two Solution

Dave Thomas maintains a blog of “Code Kata’s”. He describes a code kata as being a practice session for practicing coding skills:

Each is a short exercise (perhaps 30 minutes to an hour long). Some involve programming, and can be coded in many different ways. Some are open ended, and involve thinking about the issues behind programming. These are unlikely to have a single correct answer.

Here are two of my solutions for kata two using Ruby. The first uses recursion and the second uses an iterative approach.

Recursive solution …

# Recursive implementation
def chop(target, values)
  # Special handling for zero and single element arrays
  return -1 if values.empty?

  return ((target == values[0]) ? 0 : -1) if values.length == 1

  # Try the bottom half first
  pos = chop(target, values[0, values.length/2])
  return pos if pos != -1

  # Then the upper half ... remember that the returned 
  # position is relative to the middle of the array.
  pos = chop(target, values[values.length/2, values.length-1])
  return pos + (values.length/2) if pos != -1

  # Didn't find what we were looking for
  return -1
end

Iterative solution …

# Iterative implementation
def chop(target, values)
start = 0
stop = values.length # stop indexes one past the end of the array

# loop until something is found or until we
# run out of elements to search
while start != stop
# Find the middle
mid = start + (stop – start) / 2

# Check to see if we got lucky
return mid if values[mid] == target

# Not this time … so lets see which half it might be in
if target < values[pos] stop = mid else start = mid + 1 end end # Didn't find what we were looking for return -1 end [/sourcecode]

Regular Expressions for Counting Lines of Code

One of the applications that I implement as a means of gaining experience in different programming languages is an application to generate code metrics. I implement the application in the chosen language and use it to count comments, lines of code etc. in source files of a variety of programming languages. Implementing this application is a useful learning exercise since it provides exposure to a variety of language features such as file handling, regular expressions and also csv and html generation (for presenting the output statistics). Correctly using regular expressions for finding source code comments right can be a tricky thing and this is the focus of this post. A Ruby based implementation of this application, called countloc, is hosted on RubyForge for reference.

A basic set of code metrics include:

  • lines of code 
  • comment lines
  • blank lines
  • total lines

For comment lines, there are three types of comments to worry about:

  • single line comments
  • multi-line comments
  • mixed comments and code in a single line

For multiline comments, it is necessary to detect the beginning and the end of the comment. For a mixture of comments and code on a single line, it is necessary to distinguish between lines containing only comments so that the “lines of code” count can be incremented. The set of regular expressions used by countloc for finding comments in Ruby, Python, Java, C++, C, C# code is shown below. Note that languages such as C, C++, C# and Java all share similar commenting styles and they are covered under the cplusplus category. Note also that escaping is required for encoding the //, /* and */ characters for C++ comments in Ruby regular expressions.

COMMENT_PATTERNS = {
  :ruby => {
    :single_line_full => /^\s*#/,
    :single_line_mixed => /#/,
    :multiline_begin => /=begin(\s|$)/,
    :multiline_end => /=end(\s|$)/,
    :blank_line => /^\s*$/
  },

  :python => {
    :single_line_full => /^\s*#/,
    :single_line_mixed => /#/,
    :multiline_begin => /^\s*"""/,
    :multiline_end => /"""/,
    :blank_line => /^\s*$/
  },

  :cplusplus => {
    :single_line_full => /^\s*\/\//, 
    :single_line_mixed => /\/\//,
    :multiline_begin => /\/\*/,
    :multiline_end => /\*\/\s*$/,
    :multiline_begin_mixed => /^[^\s]+.*\/\*/,
    :multiline_end_mixed => /\*\/\s*[^\s]+$/,
    :blank_line => /^\s*$/
  }
}

As each line of source code is read, the above regular expressions are evaluated in the following order with the convention that as soon as a match is found, no further regular expressions need to be evaluated … unless otherwise noted.

  1. multiline_end_mixed (if within a multiline comment) 
  2. multiline_end (if within a multiline comment)
  3. multiline_begin … then check for multiline_begin_mixed and multiline_end (to check for a complete  multiline comment on a single line!)
  4. single_line_full
  5. single_line_mixed
  6. blank_line

Based on the countloc implementation and the code samples used to test it, this order appears to be sufficient. However, I sometimes find a corner case not covered by the regular expressions or evaluation order given above, so I am interested in any feedback with suggested improvements or optimizations.

Creating a RubyGem Package

Ruby Gems are the standard way of distributing Ruby applications and library classes. Information on RubyGems can be found at http://www.rubygems.org.

Packaging a Ruby application or library class into a Gem package is fairly straightforward but finding documentation on it wasn’t. The Gem Spec Reference was useful, but didn’t really provide everything that was needed to build a gem.

After some searching, I came across the a more concise guide.