Rails 3 Patch: Encoding Bug while Action Caching with MemCacheStore
While upgrading an application from Rails 2.3 and Ruby 1.8.7 to Rails 3.0 and Ruby 1.9.2 I ran into the following error message:
Encoding::CompatibilityError: incompatible encoding regexp match
(ASCII-8BIT regexp with UTF-8 string)
After some searching around I was able to determine that this was a
MemCacheStore
issue. In particular it was caused when trying to perform action
caching on a URL that contained special UTF-8 encoded characters, like the
umlaut. As luck would have it
someone else was experiencing this same issue, but their lighthouse
ticket had been around for a good two months without any activity. I guess not
too many Ruby on Rails developers are performing action caching on pages with
UTF-8 encoded parameters. Anyways, I took it upon myself to create a patch to
solve this issue.
The bug was located in ActiveSupport’s mem_cache_store.rb
on line 33, where
the following regular expression was defined:
By itself this doesn’t seem like much of an issue, but what you might not
realize is that the Rails source code is encoded in US-ASCII, so when your UTF-8
encoded string is gsubbed with this regular expression, you get an
Encoding::CompatibilityError
on the first line of the
ActiveSupport::Cache::MemCacheStrore#escape_key
function:
My solution to this problem was to change the escape_key function so that it was compatible with any encoded string. I decided that the simplest solution would be to force the memcached key into the same encoding as the regular expression, but to do that I would need to duplicate the key string, because performing a force_encoding on a string actually changes the string in memory, it doesn’t create a new string. If the string is not duplicated, the original key will get sent back to the Rails application with a different encoding and this can cause issues if you perform any concatenation with that string, since most of the strings in your app will probably be encoded in UTF-8. Try this out for yourself in you console:
Once the string was encoded in ASCII-8BIT, it was compatible with the regular expression and no error would be raised. Here’s the final function I came up with:
Please give my patch a try, it includes tests for all encoding types and special characters. If you get your tests passing please leave a comment on the lighthouse ticket that you were able to verify the fix.
Update - This patch has been merged into the rails branch as of April 28, 2011.