Including and extending modules in Ruby
One of the coolest features in Ruby is the existence of modules and the possibility of including their implementation in any object. This simple behavior is the source of things like the Enumerable module, that gives you a bunch of methods to work with a collection of objects and just expects that the class that included it to define an “each” method. You write a class, define an “each” method, include Enumerable and your’re done, all Enumerable methods are available for you.
Another example is the Comparable module, when you include the Comparable module in your class, you must define the operator (the UFO operator), the Comparable module will give you the implementation the following operators/methods:
< , <= , > , >= , ==, between?
This is usually why we call them mixins, because they are “mixing in” their behaviors (their methods/messages) into our objects. The idea of mixins serve a purpose similar to that of the multiple inheritance, that is to inherit an implementation from “something” without having to be a direct child of that “something”, in multiple inheritance you would be able to inherit from as many classes as you wanted to. In Ruby we don’t have multiple inheritance, but we can include as many modules as we want, so they give us the same feature, without all the hassle that multiple inheritance usually brings to a language.
The method resolution mechanism is pretty simple, first, if a method in a module that is being included is already defined in the class that is including it, the method of the class has precedence (which means that the method on the module will be ignored). If two modules define a method with the same name, the method on the last module included will be the one available at the class that has included both modules (remember that in Ruby there is no method overloading mechanism). Here’s an example of how it works:
module SimpleModule
def a_method
puts 'a_method at module'
end
def another_method( parameter )
puts "Calling another method with parameter -> #{parameter}"
end
end
module AnotherModule
def another_method
puts 'Calling another method without a parameter'
end
end
class SimpleClass
include SimpleModule
include AnotherModule
def a_method( param )
puts "a_method at class -> #{param}"
end
end
instance = SimpleClass.new
#calling the method defined on the class
instance.a_method 'parameter'
#calling method on the AnotherModule
instance.another_method
#this line will throw a 'wrong number of arguments' error
instance.a_method
An ugly example for an ugly practice, don’t rely on these things when you’re writing your own modules, strive to create unique modules that aren’t going to have method names clashing when they are included in other classes. If you have to rely on these rules to write and use your modules, maybe there is a problem in your code or in what you’re trying to do.
Extending methods
As the title of this post says, you can include and also extend modules, but what does it means to extend a module?
When you extend a module, you are adding the methods of that specific module into the object instance you call “extend”. So, the methods of that module will only be available at that specific instance (and not all objects of that class), other objects of the same class will not have the methods of the module available. With this, you can add specific behaviors to just one object of your system, without changing the other ones. Here’s an example:
module InstanceMethods
def simple_method
puts "im a method that belongs to an instance"
end
end
class SimpleObject
end
object = SimpleObject.new
object.extend InstanceMethods
object.simple_method
another_object = SimpleObject.new
#the following line will throw an error, as this instance doesn't extends the module
another_object.simple_method
This might look like a weird feature, how many times have you wanted to introduce a method into a single object?
Not that many, probably, unless this instance is in fact an instance of the Class class (that contains the class methods of your object), and this is where extending modules get interesting and this is how many of the Rails plugins are written, let’s see how we can use this to write our own acts_as_votable plugin.
Rails, extending and including modules
First thing to do is create your Rails project:
rails --database=mysql include_extend_modules
With the project created, we have to create our plugin (enter in your Rails project folder):
script/generate plugin acts_as_votable
This will create a folder called acts_as_votable at the vendor/plugins folder and the plugin skeleton code. The first thing to do is to create our Vote model. It’s a dead simple model, with a polymorphic relationship with a “votable” and a boolean column called “up”, representing if this vote is “up” or “down”. The vote.rb file should live at the vendor/plugins/acts_as_votable/lib folder. Here’s the model code:
#vendor/plugins/acts_as_votable/lib/vote.rb
class Vote < ActiveRecord::Base
belongs_to :votable, :polymorphic => true
validates_presence_of :votable
end
Now we have to create a migration to create the votes table at the database:
script/generate migration create_votes
And there is the migration code:
class CreateVotes < ActiveRecord::Migration
def self.up
create_table :votes do |t|
t.integer :votable_id, :null => false
t.string :votable_type, :limit => 15, :null => false
t.boolean :up, :default => false, :null => false
t.timestamps
end
add_index :votes, [ :votable_id, :votable_type ]
end
def self.down
drop_table :votes
end
end
After creating the Vote model and it’s migration, we’ll head to that acts_as_votable.rb file in our plugin folder, it’s where the code that ties the Vote model with the application will live, here’s the code that will be in there:
#vendor/plugins/acts_as_votable/lib/acts_as_votable.rb
module ActsAsVotable
module ClassMethods
def acts_as_votable
has_many :votes, :as => :votable, :dependent => :delete_all
include InstanceMethods
end
end
module InstanceMethods
def cast_vote( vote )
Vote.create( :votable => self, :up => vote == :up )
end
end
end
We have created a module called ActsAsVotable to serve as our namespace and in it we have two modules ClassMethods and InstanceMethods. The ClassMethods module defines the methods that we want to introduce at the ActiveRecord::Base class, so that we can just call “acts_as_votable” in any model that inherits from ActiveRecord::Base (just like any other ActiveRecord plugin) and the InstanceMethods module contains the methods that we want an instance that is “votable” to have.
So, if I say that a NewsArticle class is votable, its instances will have the cast_vote method, as the module InstanceMethods was included when they called acts_as_votable. But before creating the NewsArticle model, we have to do some changes in our init.rb file for the acts_as_votable plugin, here’s how it should look like:
#vendor/plugins/acts_as_votable/init.rb
require 'vote'
require 'acts_as_votable'
ActiveRecord::Base.extend ActsAsVotable::ClassMethods
This is where we make the acts_as_votable method available to all classes that inherit from ActiveRecord::Base and this is one of the most common uses of “extending” modules you will see, that is adding the methods of a module in a “class” object. Making a class object extend a module will make the module methods available at that class instance, which means they are now “class methods” for instances of that class. In our example, “acts_as_votable” is now a class method of the ActiveRecord::Base class, so, you can call it if you have a reference to the ActiveRecord::Base class or any of it’s subclasses (that is exactly what we’re doing).
Now that we have the code hooked to ActiveRecord, let’s create a simple model to try some tests, create our NewsArticle model:
script/generate model NewsArticle title:string article:text
Now, at the news_article.rb file:
#app/models/news_article.rb
class NewsArticle < ActiveRecord::Base
acts_as_votable
validates_presence_of :title, :article
end
We just call the acts_as_votable class method, that is available as we “exetended” the ActsAsVotable::ClassMethods module into the ActiveRecord::Base class, the superclass of our NewsArticle class. Here’s an example of you could do with our models:
article = NewsArticle.create(:title => 'sample', :article => 'sample')
#calling the cast_vote method from the ActsAsVotable::InstanceMethods module
article.cast_vote :up
article.cast_vote :down
#acessing the votes association defined when you called the acts_as_votable method
article.votes
And that’s it, you now know how and when to include or extend modules and even how to build a simple acts_as plugin for your models.
PS: You can get the full code for this example here.
Handling database indexes for Rails polymorphic associations
One thing that is usually overlooked when defining tables and their associations in a Rails application are the indexes. Usually, this comes from the idea that “my ORM tool does the job” and in fact it might be true sometimes. One of the most successful ORM tools in the Java land, Hibernate, generates a database with indexes for all foreign keys that you have, so Java programmers that use it don’t really worry about these issues (at least not until their database is slowing down to death).
ActiveRecord migrations, on the other side, don’t really worry about these things ( unless you’re using the cool Foreign Key Migrations plugin ), you must define the indexes that you need by yourself. Usually this is done by a simple call like this:
add_index :comments, :user_id
This will create an index for the column :user_id at the :comments table. For simple associations this is straightforward, but ActiveRecord offers goodies that are not so common in other tools and one of them is the “polymorphic associations”. With polymorphic associations you can define an association without defining the kind of the object you will be associated with, you just say that it’s a polymorphic association and you’re done. The code would look like this:
class Comment
belongs_to :commentable, :polymorphic => true
belongs_to :user
end
To make this work, at the database level you would need two columns at the :comments table, one called :commentable_id, that will hold the id of the object that owns the comment, and another called :commentable_type, that will hold the full class name of the object that owns the comment. So, if you’re commenting in a Post object with an ID of 1, the commentable_id would be 1 and the commentable_type would be “Post”. At the Post model the association would look like this:
class Post
has_many :comments, :as => :commentable
has_one :user
end
When you’re using polymorphic associations, your queries for the object will usually contain the commentable_id and the commentable_type in the where clause, as you would be looking for comments for the commentable_id of 1 and commentable_type of “Post”, so it makes sense to create indexes for these columns. As you’ve already saw, you could do this with the following code:
add_index :comments, :commentable_id
add_index :comments, :commentable_type
And now you have two indexes, one for each column, your database searches should fly with this, shouldn’t they?
Well, they will not. You’re defining two different indexes, one for each column, but you almost never search for them in separate, you’re always searching for the :commentable_id and also for the :commentable_type, so you should create an index for both columns and not for each one of them, the call should be something like this:
add_index :comments, [ :commentable_type, :commentable_id]
This is going to generate an index with both columns and your queries for your polymorphic models will now really be faster than before.
Obviously, you can also create indexes for the :commentable_type and :commentable_id columns if you search for them in separate, but having a lot of indexes in your table slows down update calls and might also create big tables in your filesystem. So, when defining polymorphic associations, remember to create an index for both columns and not just one for each of them.
And before you go, when ActiveRecord creates a string column at the database level, you can define a :limit option that defines the size of the VARCHAR column at the database. If you don’t give a limit, it’s going to be set as a VARCHAR(255) and I really believe you will not have a model class with a name that has 255 characters, so, instead of creating a column with an unreasonable size (that is going to slow down queries and generate bigger indexes), give it a limit that’s real. Our final table definition would look like this one:
create_table :comments do |t|
t.integer :user_id
t.integer :commentable_id
t.string :commetable_type, :limit => 20 #could be even less
t.text :comment
end
add_index :comments, :user_id
add_index :comments, [:commentable_id, :commentable_type]
Farewell, but not goodbye
Running Codevader for the last year has been really exciting and we’ve worked on some great projects with some awesome clients. One of those clients was a startup that has now hired the entire Codevader team, including myself.
So Codevader will not be taking on any more work. If you find your way to the Codevader website because you’re looking for a team to build you something, contact me anyway, since I know many other developers who are always looking for work.
What happens to CustomerFu? We’re licensing CustomerFu to Joerg Diekmann at These Lovely Days. He’ll be relaunching the site in the near future and will be operating it thereafter.
Our startup will be launching in public beta in November. We’re keeping the Codevader blog to post about Ruby and Rails stuff as we work on the project. The new project will also have its own non-technical, business-focused blog.
This blog has been really quiet for the last few months while all this was going down, but we hope to get back to more regular posting.
Building a recommendations service
I have on my hands an interesting task, build a recommendations engine based on user ratings. Basically, the engine should be able to find users that are likely to have the same taste and with this information recommend items for them. The tools used for that go by the name of Collective Intelligence, that uses information generated by a group of people to perform predictions, offer recommendations, find out which of them have similar tastes and things like that.
One of the best texts about it is the “Programming Collective Intelligence” book by Toby Segaran, published by O’Reilly. Toby covers topics that span from recommendations engines to classification algorithms and all this in a simple way that even people not really used to statistics and math (just like me) can understand and effectively use the samples to build things that work.
My idea to perform the task is to build a generic infrastructure to perform recommendations, just like we have full text indexing tools, we could also have recommendations tools, as the tools really don’t care about what you’re recommending, they just need their data set and time to run the algorithms and perform their calculations. On these first steps, I’m still experimenting with the ideas and porting the code found in the book from Python to Ruby (and that isn’t really hard, Python is almost as readable as Ruby), you can get the full source at it’s GitHub repository.
The current application has users, movies and ratings, and every user can rate any movie once, beyond that we have the two first user similarity algorithms ported and here is where the things start to get interesting. The algorithms live at a plugin folder called acts_as_recommable (this is not a definite name) and the plugin itself defines only one class, the CodeVader::RecommendationsService, this class is the registry where all similarity algorithms will live and it’s also the class you use when performing similarity tests.
Ideally, all similarity algorithms would be registered at this class and you would be able to choose which algorithm you want to use for your comparison. Here’s an example of how an algorithm would be registered (this is the Euclidean distance algorithm):
CodeVader::RecommendationsService.register :euclidean_distance, true do |first_user_ratings, last_user_ratings|
sum_of_squares = 0
0.upto( first_user_ratings.size - 1 ) do |index|
sum_of_squares += (first_user_ratings[index].score - last_user_ratings[index].score) ** 2
end
1.0/( 1 + sum_of_squares )
end
You would just have to call the “register” method and pass a closure containing the algorithm, this closure would then be stored at the RecommendationsService class and be available to the whole application. Using this “registry” based approach simplifies the addition of new algorithms if when they are needed and allows the users to select the best algorithms based on their needs.
To be able to run any of the algorithms you have to load a list of common ratings for two users, currently this is done by an ugly SQL query (well, not that ugly), if the users don’t have any common ratings, the algorithm can not be run, so you will probably need some ratings for at least two users to start using the application.
Currently, we have the Euclidean distance and the Pearson correlation algorithms implemented and you can see both of them in action by viewing a user’s page in the application. At the users page you will see links to perform comparisons with other users based on the two algorithms and from what I could try, the Pearson correlation seems to be the one giving the best results.
Both of the algorithms work by loading the lists of common ratings, performing some math on the scores and generating a number between 0 and 1, with 0 meaning no similarities and 1 meaning that they have exactly the same scores for all ratings. With this in mind, every data set must include a “score”, it doesn’t have to be something like from 0 to 5, it doesn’t even have to be positive numbers, but you have to define different values for each of the possible cases, so, in our movies example, the score ranges from 0.0 to 5.0, the current Rating model doesn’t care which model you’re rating, just that the same user can’t rate it twice and that the scores are numbers.
As you might expect, none of the algorithms will work well with little data sets, both of the users need a bunch of ratings for them to be really accurate. For instance, two users what have just one rating with the same score for a specific movie will get a similarity value of 1, even if you surely don’t have enough information to be sure of that.
So, now I’m sailing towards grouping people that look alike each other and generate ratings based on this similarity, the sample code for the application is available (some specs are failing, they’ll be fixed) and any ideas of improvements and comments are welcome.
Keep an eye at this blog and the GitHub repo
PS: Another book that seems to be interestin is this one, my MEAP copy has just arrived and I should be commenting about it soon.