Full text search on Rails using the acts_as_solr plugin
Now let’s learn with a simple example how you can use this plugin to add full text search functionality to your models. But before that, let’s check out acts_as_solr and Solr features:
Features
Solr features
- Based on the proven and widely known Lucene search library
- Using the fast and lightweight Jetty HTTP server
- Many filters, plugins and complements available from the community (stemmers, charset converters, stop-words filters with lists in many languages)
acts_as_solr features
Warming up
To start using this plugin you’ll have set up a Rails application ( rails -d mysql acts_as_solr_sample ) and then install the plugin like this:
ruby script/plugin install git://github.com/mauricio/acts_as_solr.git
To be sure that the plugin was correctly installed, check if there’s a file at “RAILS_ROOT/config/solr.yml” and a folder at “RAILS_ROOT/config/solr”. If both the config file and folder exist, you’re ready to start using acts_as_solr in your application.
As Solr is based on Lucene that’s a Java full text search engine you’ll also have to install a Java Runtime Environment (JRE) in your machine if you don’t already have one, you can find the latest version here - http://java.com/
Configuration
The first configuration file we have to check out is the solr.yml:
# Config file for the acts_as_solr plugin.
#
# If you change the host or port number here, make sure you update
# them in your Solr config file
development:
url: http://localhost:8982/solr
# uncomment this line if you want to have Solr errors raised at your application
# if this property is undefined or set to false the errors will be logged
# using the rails logger but they will not be raised to the application
# raise_error: true
production:
url: http://localhost:8983/solr
# raise_error: true
This is the configuration to start the Solr server and also the configuration that the plugin will use to make requests to this server so you need to be sure that the host contains the real name of the machine that is going to host the Solr server, specially in production. Every environment can use it’s own configuration and the raise_error config tells acts_as_solr if the errors received when trying to talk to the Solr server should be raised to your application or not. The default value is “false” which means that the errors are not going to be raised to your application but they will be logged using the Rails logger. We’ll get back to error handling later.
The files under “RAILS_ROOT/config/solr” are the heart of your Solr server configuration, they tell Solr which filters and field configurations should be used and that’s also were your index files are going to be stored. The Solr index files live at “RAILS_ROOT/config/solr/data/RAILS_ENV”. Be sure to ignore the index folders when pushing data to your source control system.
Using the plugin
Now it’s time to get your hands dirty using the plugin for real, let’s start building the model that’s going to be searched, the NewsStory:
class NewsStory < ActiveRecord::Base
acts_as_solr
validates_presence_of :title, :description
def to_s
title
end
end
And here’s the migration that’s going to create it:
class CreateNewsStories <> ActiveRecord::Migration
t.string :title, :null =>:false
t.text :description
t.timestamps
end
end
def self.down
drop_table :news_stories
end
end
With our model created and the migration run (rake db:migrate) we’ll have to start the Solr server:
rake solr:start
After this you should get some output like this:
Solr started successfully on 8982, pid: 12770.
2009-05-29 15:00:09.853::INFO: Logging to STDERR via org.mortbay.log.StdErrLog
2009-05-29 15:00:09.966::INFO: jetty-6.1.18
2009-05-29 15:00:10.027::INFO: Extract file:/home/mauricio/NetBeansProjects/acts_as_solr_sample/vendor/plugins/acts_as_solr/jetty/webapps/solr.war to /tmp/Jetty_localhost_8982_solr.war__solr__6dieve/webapp
2009-05-29 15:00:10.972::INFO: Opened /home/mauricio/NetBeansProjects/acts_as_solr_sample/log/development_2009_05_29.request.log
2009-05-29 15:00:10.992::INFO: Started SelectChannelConnector@localhost:8982
The Jetty server that loads the Solr webapp is now ready to begin indexing your data and answering for search calls. Let’s add some news stories do the database (fire your “ruby script/console”):
NewsStory.create(
:title => 'acts_as_solr rocks',
:description => 'a simple and easy way to do full text searching in your rails app' )
NewsStory.create(
:title => 'couchdb is the next big thing',
:description => 'you shuld start paying attention to it, nice and easy way to store and search data' )
Now it’s time for us to search for them using Solr:
news_stories = NewsStory.find_by_solr( 'easy' )
news_stories.each { |news_story| puts news_story.title }
You should receive an object ( a ActsAsSolr::SearchResults ) with the two news stories you just persisted to your dabase, this object behaves just like a common Array, so you can use it anywhere you’d expect to use an Array but it also implements the same methods found at the will_paginate collection so you can also use it at your will_paginate view helpers.
And guess what? That’s it!
Now you have full text search working in your application in an almost effortless way. Read the plugin docs to get a better feel of the options you can use at the acts_as_solr and find_by_solr methods and you’re ready to go live using one of the most advanced open source search tools available today.
You can find the plugin here at GitHub - http://github.com/mauricio/acts_as_solr/tree
with_scope and named_scopes ignoring stacked :order clauses
If you’ve been using with_scope and named_scopes a lot with ActiveRecord you have probably noticed that the :order clauses defined at the scopes are lost and only the first :order clause is used. If you defined an :order clause you’d like to have it merged with the other ones already provided. Here’s a simple example:
class User
named_scope :by_first_name, :order => "#{quoted_table_name}.first_name ASC"
named_scope :by_last_name, :order => "#{quoted_table_name}.last_name ASC"
end
Our user has two named scopes defined and both of them define an :order clause, if we try to run a finder like this:
User.by_first_name.by_last_name.all
This is the generated query:
SELECT * FROM `users` ORDER BY `users`.first_name ASC
As you’ve noticed, only the first :order clause was used, the last one was lost. Our ideal SQL query would have to look like this, with both :order clauses being used:
SELECT * FROM `users` ORDER BY `users`.last_name ASC , `users`.first_name ASC
That’s why we’re going to hack the with_scope method a litle bit to reach our goal. This issue was already reported to the Rails issue tracker but there’s no fix yet so our only hope is to monkeypatch Rails to behave as we expect it to, so here’s a really simple fix for the problem:
ActiveRecord::Base.class_eval do
class << self
def merge_orders( *orders )
orders.join( ' , ' )
end
def with_scope_with_hack(method_scoping = {}, action = :merge, &block)
method_scoping = method_scoping.method_scoping if method_scoping.respond_to?(:method_scoping)
# Dup first and second level of hash (method and params).
method_scoping = method_scoping.inject({}) do |hash, (method, params)|
hash[method] = (params == true) ? params : params.dup
hash
end
method_scoping.assert_valid_keys([ :find, :create ])
if f = method_scoping[:find]
f.assert_valid_keys(VALID_FIND_OPTIONS)
set_readonly_option! f
end
# Merge scopings
if [:merge, :reverse_merge].include?(action) && current_scoped_methods
method_scoping = current_scoped_methods.inject(method_scoping) do |hash, (method, params)|
case hash[method]
when Hash
if method == :find
(hash[method].keys + params.keys).uniq.each do |key|
merge = hash[method][key] && params[key] # merge if both scopes have the same key
if key == :conditions && merge
if params[key].is_a?(Hash) && hash[method][key].is_a?(Hash)
hash[method][key] = merge_conditions(hash[method][key].deep_merge(params[key]))
else
hash[method][key] = merge_conditions(params[key], hash[method][key])
end
elsif key == :include && merge
hash[method][key] = merge_includes(hash[method][key], params[key]).uniq
elsif key == :joins && merge
hash[method][key] = merge_joins(params[key], hash[method][key])
elsif key == :order && merge
hash[method][key] = merge_orders(params[key], hash[method][key])
else
hash[method][key] = hash[method][key] || params[key]
end
end
else
if action == :reverse_merge
hash[method] = hash[method].merge(params)
else
hash[method] = params.merge(hash[method])
end
end
else
hash[method] = params
end
hash
end
end
self.scoped_methods << method_scoping
begin
yield
ensure
self.scoped_methods.pop
end
end
alias_method_chain :with_scope, :hack
end
end
You can place this code at an initializer (maybe called with_scope_fix.rb) or at your lib folder and require it in your initializers. And now all your :order clauses defined by named_scope or with_scope calls will be correctly merged and will not be lost in your code.
Google sitemaps made easy
Now that Talkies is live, we obviously need to get word out that we exist. Part of that is making sure that Google can find all our pages. Since not all movies or actors are accessible directly via a link, we needed to implement a sitemap (sitemaps.org) file that we can submit to Google.
Easiest way to do that was with Alex Rabarts’ big_sitemap gem. Implementation was straightforward, and we’ve got a cron job that runs once a day to keep the sitemaps updated.
Updating a new record with an after_save callback
We have a Photo model in which we want to store a list of actors featured in that photo, to make it easier for Solr to search photos. When a new photo is uploaded users can associate one or more actors with the photo. This arrives from our form submission as actor id’s.
So we’d like to have an after_save callback that looks up the actor names and adds them to our special index field. Problem is, if we have this
class Photo < ActiveRecord::Base
after_save :set_index_representations
private
def set_index_representations
update_attribute :index_repr_of_actors, actors.all.collect{|p| p.full_name }.join(" ")
end
then our after_save gets called again after we’ve updated our new field. Oops! Endless loop.
We need a way of updating the record without calling after_save again. The solution is update_all.
def set_index_representations
Photo.update_all( "index_repr_of_actors = '#{ actors.all.collect{ |p| p.full_name }.join(" ") }'", "id = #{id}")
end
Just be sure to specify the condition, otherwise you’ll end up updating all records!
Rails Summit Latin America
Well, I’m at the Guararapes Airport in Recife and it’s time to review what I’ve seen these two days at the Rails Summit. The first thing to say is that even without so much time to get everything done, the organization did pretty well. The event was well planned, the places were big enough to hold everyone and, most important, it was about time that the Brazilian and Latin American Ruby and Rails community had it’s own “Rails Conf”, so, here goes a big compliment to Fabio Akita and the guys at Locaweb for making this possible.
Day 1
The opening keynote with Gilberto Mautner (Locaweb’s founder and CIO) and Fabio Akita was a nice introduction to Locaweb’s history and also our little Ruby and Rails community history, with the first Portuguese written books about Ruby and Rails. A nice introduction for one of the biggest Rails events Brazil has ever seen (we had another big one in Rio, the “Rio on Rails”, not that long ago…).
DHH
And then came the real content, the first presentation was given by DHH, the idea was that people would ask questions and he would answer them. I asked David about the thread-safety in Rails and if it’s now thread-safe from top to bottom and how this would impact in our current state with rails deployments, he said that while it was nice that they had done it, only JRuby would be able to get something out of it, as the MRI (Matz’s Ruby) doesn’t use native threads and can’t really be “concurrent”, also, some plugins might need some changes to get working again with these new changes (I’ve already had to struggle with masochism to get it working with edge rails).
The idea of having him answering questions was cool, but I don’t think it really worked, the questions were too fragmented and maybe it would be better to let him just talk about some interesting topic or something like that (or maybe not even bothering to call him at all, more about that later).
Chad Fowler
Next one was Chad Fowler and he was the star on the first day with a great presentation about what he spoke at the “My Job Went to India” book (and there’s a new version coming) . A lot of interesting tips to make you a better developer and how being a great developer isn’t going to make your job fly to India. Unfortunately, the event agenda said his talk was about “Evolving a Framework” and I couldn’t see any of this in there and this wasn’t really cool, he gave an awesome presentation, but it wasn’t what I was expecting it to be.
George and Danilo
Then we got George Malamidis and Danilo Sato talking about REST architectures and a little bit about the Rails RESTful way, it was a great introduction to the topic. One of the most interesting things that I got is that you can use proxies (like Apache and Squid) to perform caching of pages with “question marks” (eg. “/articles?id=1”), even if this doesn’t work with rails itself. Obviously, if you’re using REST you shouldn’t have resources accessible with URLs like that one, but sometimes you don’t really have a choice.
Dr. Nic
Well, this one was definitely cool, but almost everything Dr. Nic said here had already been said by Chad in his presentation. Obviously it was not his fault and he gave a great presentation about how you can use the tools available to develop software collaboratively and how to make yourself well known in the community, but we already knew that and it made his presentation lose some of it’s glow.
At this point, I was completely overwhelmed by sleep so I headed to the hotel to have some sleep and didn’t see Chris Wasntrath presentation. The people who attended it explained it more or less like this:
“He got a piece of paper, read it out loud from head to tail and didn’t say the GitHub word not even once”.
Maybe it was better for me going to sleep early.
Day 2
Phusion guys
This one started the day in hot voltage. The Phusion guys are definitely cool and gave an awesome presentation, about a lot of scalability myths in and out of Rails and obviously about Passenger and Ruby Enterprise, their flagship products. If you haven’t tried them, go do it now!
Charles Nutter and Tom Enebo
To tell you the truth, this was the talk made me travel two hours by bus and 3 by plane to reach São Paulo, unfortunately the JRuby guys couldn’t come to Brazil (Passports and customs…) so they gave their presentation using Skype and sometimes the sound was impossible to understand. Beyond that, the talk was interesting, with an introduction about JRuby, what it could do, where it’s being used and where they are heading to. The most interesting news was the work towards enabling native libraries written in C in JRuby as they work in Ruby, this would give us the solution for the last JRuby problem, that is the Ruby gems and libraries that depend directly on C code.
Jay Fields
Jay did a presentation about how our testing is immature and how the tools that we use have some cool features and a lot of drawbacks. He walked through Selenium, Test::Unit, RSpec, expectations and gave examples about how they work and how they can fail you in some scenarions. The message was, take the tool that you like, but understand the drawbacks and issues they will insert into your code and tests to be sure that it’s exactly what you’re looking for. But one thing I can’t really accept is that he said that we don’t have best practices or patterns in testing, that we’re too new and we’re still experimenting things. IMHO, we have a lot of best practices on the testing community, we even have books written about it, so stating that there are no well known patterns and best practices is being a bit too radical.
David Chelimsky
And here comes the second best presentation of the day, with David Chelimsky talking about RSpec and Cucumber (the new story runner). He spoke about testing, about TDD, about how to notice and avoid code smells, how to design classes and even had some time to talk about RSpec, Cucumber and show some live examples (he’s even able to speak a little bit of Portuguese ). I wasn’t really that interested in the Story Runner but having him talking about it and showing some live examples is definitely changing my ideas about that.
Phillipe Hanrigou
Phillipe have a presentation about testing (third one today…) and about how to make you tests run faster using a Selenium grid. I was definitely interested in this presentation hoping that we was going to show something about DTrace, debugging or profiling but most of the talk was about testing and how to get things happening faster. It was interesting, but nothing that new.
Fabio Kung
Well, it’s definitely hard to talk about someone that you know, but Fabio’s presentation was a cool introduction about why you should care to run Ruby on Java and why you should care about integrating Java to Ruby. He spoke about how the JMV is designed, why it’s so fast at running any kind of code and how people deploying rails applications can get a big performance boost (specially with the new concurrent rails). He also showed a demo about how to use the jetty_rails plugin to run your rails application with Jetty.
Obie Fernandez
And then we reached the ending keynote with one greatest guys in the Rails community, Obie Fernandez. He closed the event with the best presentation given on it, he spoke about his company, Hashrocket, how they manage to build their applications, what do they do to get things done and how Ruby, Rails and agile development play a big part in it all. It was great to see someone with his experience speaking about entrepreneurism and about how to build a cool environment to get a lot of work done without making the people feel they are wasting their lives. And as he say, never lose sight of the fun in what you’re doing.
Concluding
The event had it’s problems but it was definitely nice in the end, I got to meet a lot of people I only knew over IM tools and meet again with some old friends. For me, the biggest problem was the lack of “deep” talks. Most of what was shown were introductions to common things and we had 3 presentations about testing (obviously, every presenter had it’s own talk, but it would be nice to have some other topics in there). We had some big guys like DHH, Dr. Nic, Chad Fowler and I think they could all be better used with deeper talks about not so introductory and “basic” material. I know that not everyone that was there is working with Ruby and Rails daily, but there were two tracks and I was expecting that at least one of them would be more about new things and interesting stuff you didn’t hear about.
But anyway, it’s the first one and they didn’t have that much time to get everything up and running, so let’s look forward to the see what the next year will bring to us. And yet again, my most sincere compliments to Fabio Akita and Locaweb for making this all possible!
Including and extending modules in Ruby
One of the coolest features in Ruby is the existence of modules and the possibility of including their implementation in any object. This simple behavior is the source of things like the Enumerable module, that gives you a bunch of methods to work with a collection of objects and just expects that the class that included it to define an “each” method. You write a class, define an “each” method, include Enumerable and your’re done, all Enumerable methods are available for you.
Another example is the Comparable module, when you include the Comparable module in your class, you must define the operator (the UFO operator), the Comparable module will give you the implementation the following operators/methods:
< , <= , > , >= , ==, between?
This is usually why we call them mixins, because they are “mixing in” their behaviors (their methods/messages) into our objects. The idea of mixins serve a purpose similar to that of the multiple inheritance, that is to inherit an implementation from “something” without having to be a direct child of that “something”, in multiple inheritance you would be able to inherit from as many classes as you wanted to. In Ruby we don’t have multiple inheritance, but we can include as many modules as we want, so they give us the same feature, without all the hassle that multiple inheritance usually brings to a language.
The method resolution mechanism is pretty simple, first, if a method in a module that is being included is already defined in the class that is including it, the method of the class has precedence (which means that the method on the module will be ignored). If two modules define a method with the same name, the method on the last module included will be the one available at the class that has included both modules (remember that in Ruby there is no method overloading mechanism). Here’s an example of how it works:
module SimpleModule
def a_method
puts 'a_method at module'
end
def another_method( parameter )
puts "Calling another method with parameter -> #{parameter}"
end
end
module AnotherModule
def another_method
puts 'Calling another method without a parameter'
end
end
class SimpleClass
include SimpleModule
include AnotherModule
def a_method( param )
puts "a_method at class -> #{param}"
end
end
instance = SimpleClass.new
#calling the method defined on the class
instance.a_method 'parameter'
#calling method on the AnotherModule
instance.another_method
#this line will throw a 'wrong number of arguments' error
instance.a_method
An ugly example for an ugly practice, don’t rely on these things when you’re writing your own modules, strive to create unique modules that aren’t going to have method names clashing when they are included in other classes. If you have to rely on these rules to write and use your modules, maybe there is a problem in your code or in what you’re trying to do.
Extending methods
As the title of this post says, you can include and also extend modules, but what does it means to extend a module?
When you extend a module, you are adding the methods of that specific module into the object instance you call “extend”. So, the methods of that module will only be available at that specific instance (and not all objects of that class), other objects of the same class will not have the methods of the module available. With this, you can add specific behaviors to just one object of your system, without changing the other ones. Here’s an example:
module InstanceMethods
def simple_method
puts "im a method that belongs to an instance"
end
end
class SimpleObject
end
object = SimpleObject.new
object.extend InstanceMethods
object.simple_method
another_object = SimpleObject.new
#the following line will throw an error, as this instance doesn't extends the module
another_object.simple_method
This might look like a weird feature, how many times have you wanted to introduce a method into a single object?
Not that many, probably, unless this instance is in fact an instance of the Class class (that contains the class methods of your object), and this is where extending modules get interesting and this is how many of the Rails plugins are written, let’s see how we can use this to write our own acts_as_votable plugin.
Rails, extending and including modules
First thing to do is create your Rails project:
rails --database=mysql include_extend_modules
With the project created, we have to create our plugin (enter in your Rails project folder):
script/generate plugin acts_as_votable
This will create a folder called acts_as_votable at the vendor/plugins folder and the plugin skeleton code. The first thing to do is to create our Vote model. It’s a dead simple model, with a polymorphic relationship with a “votable” and a boolean column called “up”, representing if this vote is “up” or “down”. The vote.rb file should live at the vendor/plugins/acts_as_votable/lib folder. Here’s the model code:
#vendor/plugins/acts_as_votable/lib/vote.rb
class Vote < ActiveRecord::Base
belongs_to :votable, :polymorphic => true
validates_presence_of :votable
end
Now we have to create a migration to create the votes table at the database:
script/generate migration create_votes
And there is the migration code:
class CreateVotes < ActiveRecord::Migration
def self.up
create_table :votes do |t|
t.integer :votable_id, :null => false
t.string :votable_type, :limit => 15, :null => false
t.boolean :up, :default => false, :null => false
t.timestamps
end
add_index :votes, [ :votable_id, :votable_type ]
end
def self.down
drop_table :votes
end
end
After creating the Vote model and it’s migration, we’ll head to that acts_as_votable.rb file in our plugin folder, it’s where the code that ties the Vote model with the application will live, here’s the code that will be in there:
#vendor/plugins/acts_as_votable/lib/acts_as_votable.rb
module ActsAsVotable
module ClassMethods
def acts_as_votable
has_many :votes, :as => :votable, :dependent => :delete_all
include InstanceMethods
end
end
module InstanceMethods
def cast_vote( vote )
Vote.create( :votable => self, :up => vote == :up )
end
end
end
We have created a module called ActsAsVotable to serve as our namespace and in it we have two modules ClassMethods and InstanceMethods. The ClassMethods module defines the methods that we want to introduce at the ActiveRecord::Base class, so that we can just call “acts_as_votable” in any model that inherits from ActiveRecord::Base (just like any other ActiveRecord plugin) and the InstanceMethods module contains the methods that we want an instance that is “votable” to have.
So, if I say that a NewsArticle class is votable, its instances will have the cast_vote method, as the module InstanceMethods was included when they called acts_as_votable. But before creating the NewsArticle model, we have to do some changes in our init.rb file for the acts_as_votable plugin, here’s how it should look like:
#vendor/plugins/acts_as_votable/init.rb
require 'vote'
require 'acts_as_votable'
ActiveRecord::Base.extend ActsAsVotable::ClassMethods
This is where we make the acts_as_votable method available to all classes that inherit from ActiveRecord::Base and this is one of the most common uses of “extending” modules you will see, that is adding the methods of a module in a “class” object. Making a class object extend a module will make the module methods available at that class instance, which means they are now “class methods” for instances of that class. In our example, “acts_as_votable” is now a class method of the ActiveRecord::Base class, so, you can call it if you have a reference to the ActiveRecord::Base class or any of it’s subclasses (that is exactly what we’re doing).
Now that we have the code hooked to ActiveRecord, let’s create a simple model to try some tests, create our NewsArticle model:
script/generate model NewsArticle title:string article:text
Now, at the news_article.rb file:
#app/models/news_article.rb
class NewsArticle < ActiveRecord::Base
acts_as_votable
validates_presence_of :title, :article
end
We just call the acts_as_votable class method, that is available as we “exetended” the ActsAsVotable::ClassMethods module into the ActiveRecord::Base class, the superclass of our NewsArticle class. Here’s an example of you could do with our models:
article = NewsArticle.create(:title => 'sample', :article => 'sample')
#calling the cast_vote method from the ActsAsVotable::InstanceMethods module
article.cast_vote :up
article.cast_vote :down
#acessing the votes association defined when you called the acts_as_votable method
article.votes
And that’s it, you now know how and when to include or extend modules and even how to build a simple acts_as plugin for your models.
PS: You can get the full code for this example here.
Handling database indexes for Rails polymorphic associations
One thing that is usually overlooked when defining tables and their associations in a Rails application are the indexes. Usually, this comes from the idea that “my ORM tool does the job” and in fact it might be true sometimes. One of the most successful ORM tools in the Java land, Hibernate, generates a database with indexes for all foreign keys that you have, so Java programmers that use it don’t really worry about these issues (at least not until their database is slowing down to death).
ActiveRecord migrations, on the other side, don’t really worry about these things ( unless you’re using the cool Foreign Key Migrations plugin ), you must define the indexes that you need by yourself. Usually this is done by a simple call like this:
add_index :comments, :user_id
This will create an index for the column :user_id at the :comments table. For simple associations this is straightforward, but ActiveRecord offers goodies that are not so common in other tools and one of them is the “polymorphic associations”. With polymorphic associations you can define an association without defining the kind of the object you will be associated with, you just say that it’s a polymorphic association and you’re done. The code would look like this:
class Comment
belongs_to :commentable, :polymorphic => true
belongs_to :user
end
To make this work, at the database level you would need two columns at the :comments table, one called :commentable_id, that will hold the id of the object that owns the comment, and another called :commentable_type, that will hold the full class name of the object that owns the comment. So, if you’re commenting in a Post object with an ID of 1, the commentable_id would be 1 and the commentable_type would be “Post”. At the Post model the association would look like this:
class Post
has_many :comments, :as => :commentable
has_one :user
end
When you’re using polymorphic associations, your queries for the object will usually contain the commentable_id and the commentable_type in the where clause, as you would be looking for comments for the commentable_id of 1 and commentable_type of “Post”, so it makes sense to create indexes for these columns. As you’ve already saw, you could do this with the following code:
add_index :comments, :commentable_id
add_index :comments, :commentable_type
And now you have two indexes, one for each column, your database searches should fly with this, shouldn’t they?
Well, they will not. You’re defining two different indexes, one for each column, but you almost never search for them in separate, you’re always searching for the :commentable_id and also for the :commentable_type, so you should create an index for both columns and not for each one of them, the call should be something like this:
add_index :comments, [ :commentable_type, :commentable_id]
This is going to generate an index with both columns and your queries for your polymorphic models will now really be faster than before.
Obviously, you can also create indexes for the :commentable_type and :commentable_id columns if you search for them in separate, but having a lot of indexes in your table slows down update calls and might also create big tables in your filesystem. So, when defining polymorphic associations, remember to create an index for both columns and not just one for each of them.
And before you go, when ActiveRecord creates a string column at the database level, you can define a :limit option that defines the size of the VARCHAR column at the database. If you don’t give a limit, it’s going to be set as a VARCHAR(255) and I really believe you will not have a model class with a name that has 255 characters, so, instead of creating a column with an unreasonable size (that is going to slow down queries and generate bigger indexes), give it a limit that’s real. Our final table definition would look like this one:
create_table :comments do |t|
t.integer :user_id
t.integer :commentable_id
t.string :commetable_type, :limit => 20 #could be even less
t.text :comment
end
add_index :comments, :user_id
add_index :comments, [:commentable_id, :commentable_type]
Building a recommendations service
I have on my hands an interesting task, build a recommendations engine based on user ratings. Basically, the engine should be able to find users that are likely to have the same taste and with this information recommend items for them. The tools used for that go by the name of Collective Intelligence, that uses information generated by a group of people to perform predictions, offer recommendations, find out which of them have similar tastes and things like that.
One of the best texts about it is the “Programming Collective Intelligence” book by Toby Segaran, published by O’Reilly. Toby covers topics that span from recommendations engines to classification algorithms and all this in a simple way that even people not really used to statistics and math (just like me) can understand and effectively use the samples to build things that work.
My idea to perform the task is to build a generic infrastructure to perform recommendations, just like we have full text indexing tools, we could also have recommendations tools, as the tools really don’t care about what you’re recommending, they just need their data set and time to run the algorithms and perform their calculations. On these first steps, I’m still experimenting with the ideas and porting the code found in the book from Python to Ruby (and that isn’t really hard, Python is almost as readable as Ruby), you can get the full source at it’s GitHub repository.
The current application has users, movies and ratings, and every user can rate any movie once, beyond that we have the two first user similarity algorithms ported and here is where the things start to get interesting. The algorithms live at a plugin folder called acts_as_recommable (this is not a definite name) and the plugin itself defines only one class, the CodeVader::RecommendationsService, this class is the registry where all similarity algorithms will live and it’s also the class you use when performing similarity tests.
Ideally, all similarity algorithms would be registered at this class and you would be able to choose which algorithm you want to use for your comparison. Here’s an example of how an algorithm would be registered (this is the Euclidean distance algorithm):
CodeVader::RecommendationsService.register :euclidean_distance, true do |first_user_ratings, last_user_ratings|
sum_of_squares = 0
0.upto( first_user_ratings.size - 1 ) do |index|
sum_of_squares += (first_user_ratings[index].score - last_user_ratings[index].score) ** 2
end
1.0/( 1 + sum_of_squares )
end
You would just have to call the “register” method and pass a closure containing the algorithm, this closure would then be stored at the RecommendationsService class and be available to the whole application. Using this “registry” based approach simplifies the addition of new algorithms if when they are needed and allows the users to select the best algorithms based on their needs.
To be able to run any of the algorithms you have to load a list of common ratings for two users, currently this is done by an ugly SQL query (well, not that ugly), if the users don’t have any common ratings, the algorithm can not be run, so you will probably need some ratings for at least two users to start using the application.
Currently, we have the Euclidean distance and the Pearson correlation algorithms implemented and you can see both of them in action by viewing a user’s page in the application. At the users page you will see links to perform comparisons with other users based on the two algorithms and from what I could try, the Pearson correlation seems to be the one giving the best results.
Both of the algorithms work by loading the lists of common ratings, performing some math on the scores and generating a number between 0 and 1, with 0 meaning no similarities and 1 meaning that they have exactly the same scores for all ratings. With this in mind, every data set must include a “score”, it doesn’t have to be something like from 0 to 5, it doesn’t even have to be positive numbers, but you have to define different values for each of the possible cases, so, in our movies example, the score ranges from 0.0 to 5.0, the current Rating model doesn’t care which model you’re rating, just that the same user can’t rate it twice and that the scores are numbers.
As you might expect, none of the algorithms will work well with little data sets, both of the users need a bunch of ratings for them to be really accurate. For instance, two users what have just one rating with the same score for a specific movie will get a similarity value of 1, even if you surely don’t have enough information to be sure of that.
So, now I’m sailing towards grouping people that look alike each other and generate ratings based on this similarity, the sample code for the application is available (some specs are failing, they’ll be fixed) and any ideas of improvements and comments are welcome.
Keep an eye at this blog and the GitHub repo
PS: Another book that seems to be interestin is this one, my MEAP copy has just arrived and I should be commenting about it soon.
CustomerFu - complaint handling for small business
A quick heads-up on a product we’re launching soon. CustomerFu (www.customerfu.com) is an online tool for companies to manage customer complaints.What happens in many companies - when they’re small, its easy enough to follow up complaints through a paper-based system, or spreadsheets. But if your complaints are being received by more than one person in your organisation, or you have more than one branch or office, things quickly get out of hand. Bigger companies would start up a call-centre, or outsource to one. But that’s an expensive exercise.So CustomerFu will provide a centralised tool for companies to manage all of their customer complaints, making sure that none of them go amiss, and that every complaint is dealt with properly. Without breaking the bank.We’re in the home straight with development, so more information soon ![]()
Cool stuff with Git
We are currently in heavy development on one of our applications. We wanted to bring the existing deployment up to speed but with some very specific limitations.
I wanted others in the team to be able to clone the branch if they needed to deploy, so I created a new remote branch called ’slicehost’
git push origin origin:refs/heads/slicehost
Next I created a local branch called “slicehost”
git branch slicehost
I did a “git checkout slicehost” and proceeded to make the changes I needed to make before deploying.
I modified the deploy.rb file for capistrano2 and deprec2 to contain
set :branch, 'slicehost'
With the changes locally commited to the slicehost branch I aksed git to
git push origin slicehost
which pushed the changes to the slicehost branch in our repo.
I ssh’ed into the server with the repo to make sure it worked. With the changes ready for deployment I ran (from my local machine again)
cap deploy
cap deprec:db:migrate
and
cap deprec:mongrel:restart
That was that. I could
git checkout master
and continue with the rest of the application development.
I really simple way to solve the problem.
I’m really enjoying git - it feels right ![]()