Google sitemaps made easy
Now that Talkies is live, we obviously need to get word out that we exist. Part of that is making sure that Google can find all our pages. Since not all movies or actors are accessible directly via a link, we needed to implement a sitemap (sitemaps.org) file that we can submit to Google.
Easiest way to do that was with Alex Rabarts’ big_sitemap gem. Implementation was straightforward, and we’ve got a cron job that runs once a day to keep the sitemaps updated.
Talkies is live!
This blog has been really quiet lately as we’ve been in the home straight for launching Talkies. Finally, we went live on Sunday night.
Talkies is now up at http://www.talkies.de. In case you don’t speak German, and the pictures don’t give it away, Talkies is a social-networking site focused around movies, movie stars and the entertainment biz. Obviously aimed at German-speakers.
We’ve got exciting things planned for the site. What you see now is only the start!
Updating a new record with an after_save callback
We have a Photo model in which we want to store a list of actors featured in that photo, to make it easier for Solr to search photos. When a new photo is uploaded users can associate one or more actors with the photo. This arrives from our form submission as actor id’s.
So we’d like to have an after_save callback that looks up the actor names and adds them to our special index field. Problem is, if we have this
class Photo < ActiveRecord::Base
after_save :set_index_representations
private
def set_index_representations
update_attribute :index_repr_of_actors, actors.all.collect{|p| p.full_name }.join(" ")
end
then our after_save gets called again after we’ve updated our new field. Oops! Endless loop.
We need a way of updating the record without calling after_save again. The solution is update_all.
def set_index_representations
Photo.update_all( "index_repr_of_actors = '#{ actors.all.collect{ |p| p.full_name }.join(" ") }'", "id = #{id}")
end
Just be sure to specify the condition, otherwise you’ll end up updating all records!
Why learning HTTP does matter
It’s interesting to notice that there’s so many people working with web applications that don’t understand the basics of the Internet and the HTTP protocol. You might find applications that exibit bizarre behaviors anywhere, people just forget to read the specs or sleep during the HTTP protocol classes at college.
One of the most harmful exhibitions of this lack of knowledge is the “POST fever”. Every form in the application performs a POST, no matter what it’s doing or the side effects involved in it, it just works that way and people just don’t have a reason not to go like that, usually, if you ask them, they’ll probably say “oh, someone told me that the GET method has size limit in it’s parameters size”.
But, what’s so bad about it?
If you take a look at the HTTP RFC, you will find that the GET method is described as a “safe” method. Safe, in the HTTP context, means that you should be able to perform GETs to a web application and this should have no side effects on it, it should not change the resource you are requesting, because the whole idea of the GET method is that you should just GET a copy of the resource at that specific URL, you’re not doing anything funny with it, you should just receive it anywhere and anytime you want to.
But if you look at the POST method description, it’s defined as an “unsafe” method. If you send a POST to a URL you might be definitely changing something and generating an evil side effect that might render the whole application useless and bring Skynet and the Terminators to lay Armageddon on Earth. Or you might just be creating a new resource, as a blog posting like this one.
The obvious difference is that POSTs can (and usually should) change the state of something at the server side, while a GET should never do something like that. If you’re keen to SQL databases, GETs are just like “select” commands and POSTs like “insert” commands. Have you ever seen an “insert” returning a result set or a “select” inserting data? Neither me
But bear with me, it GETs even worse. Imagine that you’re the owner of that evil website that I said that just uses POSTs in it’s forms and one of those forms is a search form. Users will use it to search for your products and add them to their shopping carts. A user wants to buy the new AC/DC records but he’s not sure about it’s name, so he just types AC/DC and hits enter.
Voila!
There, at the top of the list, is “Black Ice”, their new record (Have you already bought yours?). He clicks on the link and while he’s viewing the CD page he remembers that he hasn’t bought the “Stiff Upper Lip” album. “Let me hit the back button and look for it too”, thinks the poor user and when he hit the button, the browser shows an interesting message:
“The browser will need to send data to the server to perform this action. Are you sure you want to do this?”
The user looks terrified to the message. “What have I done? Will they bill me for this? Are they going to send me the new Britney Spears album ‘cos I’m trying to hit the back button?”
As the HTTP protocol mandates, POSTs are not safe and the tools (usually, our browsers) should tell the user that something bad might happen if they try to POST by accident and that’s exactly what happens if you try to hit the back button after a POST. In this example, the user wouldn’t be doing anything wrong, but instead of coming back to a search page, he could be at a “add client” page and a “back” would make him re-create the last client he sent to the database, which isn’t really interesting.
Worse, if you’re using POST in a search form, they aren’t going to be able to use the back button (and the usability gurus say that it’s the most used feature in browsers) and they aren’t going to be able to bookmark the search results! Can you imagine something worse than that? You are keeping people from expressing their love for you website by posting it in their del.icio.us favorites!
Now, the reasoning is simple, if you’re not changing anything at the server side, you should always perform GETs. They don’t break the back button, they let the users bookmark their pages and they aren’t going to make the browser show the user any funny messages. If you’re changing state at the server side you should definitely use POST (and the other HTTP methods that are designed to change state, like PUT and DELETE), GET requests should NEVER change any state at the server side.
And before I forget, after every successful POST you should REDIRECT the user to a new page and not just render the page for him in response for the POST. Redirecting the user to the “response” page keeps the user from hitting the “back” button and re-entering the data they have already sent during the last POST.
Rareshare - a social network with a difference
Our latest project to go live is a unique social network for Rareshare.org. Rareshare is a set of micro-communities, one for each of the rare disorders listed on the site.
Sufferers and those affected by the various disorders can find and share information on the disorders, and communicate with other similarly affected individuals.
The site is in beta, but if you know anyone that is affected by any of the listed disorders, or are a sufferer yourself, then please sign up. Â
Congratulations to David on this great idea. We wish the site every success!
Cool stuff with Git
We are currently in heavy development on one of our applications. We wanted to bring the existing deployment up to speed but with some very specific limitations.
I wanted others in the team to be able to clone the branch if they needed to deploy, so I created a new remote branch called ’slicehost’
git push origin origin:refs/heads/slicehost
Next I created a local branch called “slicehost”
git branch slicehost
I did a “git checkout slicehost” and proceeded to make the changes I needed to make before deploying.
I modified the deploy.rb file for capistrano2 and deprec2 to contain
set :branch, 'slicehost'
With the changes locally commited to the slicehost branch I aksed git to
git push origin slicehost
which pushed the changes to the slicehost branch in our repo.
I ssh’ed into the server with the repo to make sure it worked. With the changes ready for deployment I ran (from my local machine again)
cap deploy
cap deprec:db:migrate
and
cap deprec:mongrel:restart
That was that. I could
git checkout master
and continue with the rest of the application development.
I really simple way to solve the problem.
I’m really enjoying git - it feels right ![]()
Interesting behaviour in Ruby’s division and modulo operators
My copy of The Ruby Programming Language has finally arrived and of course that I started reading it. One of my first findings about the language semantics has to do with mathematics, more specifically, division and modulo.
As a long time Java programmer, most of my expectations about math in programming languages come from this background, so some of the behaviors of mathematical functions in Ruby have really scared me. Let’s start with a simple example, a basic division, imagine that you have -7 and you want to divide it in 3, as we can’t divide -7 by 3 we have to reach an approximation.
During my math classes, I learned that to do this approximation I would have to use a multiplication, multiply 3 until I have a value as close as possible to -7. The closest one I can get is with -2, as -2 multiplied by 3 is -6 and I would have a remainder of -1 as -1 plus -6 is -7.
A lot of numbers, right?
Open irb on a command line and write:
-7/3
So, are you getting -2? No?
No, you are not. In Ruby, -7/3 is -3.
How the hell does this happens?
In Ruby, differently from C/C++ and Java, the result of the division between two integers where one of them is a negative number will yield a result as if it was a floating point division that rounded towards negative infinity. There is no explanation about why it is done this way (comment if you have any hints) but other languages like Python and Tcl behave in the same way.
Ideally, A divided by B with a result of C and a remainder of D is equivalent to ((C * B) + D), so this simple division would break the whole mathematical equivalence between multiplication and division, right?
Not so fast. The modulo (%) operator also behaves differently when dealing with a negative division. If you try to run (-7 % 3) you will receive 2 and ( (-3 * 3) + 2 ) is exactly -7. So, the operators keep their values equivalent, they just don’t behave the way I was expecting them to.
Another interesting thing is that if you want a modulo operator that works just like Java’s, you can call the remainder method as in:
-7.remainder(3)
This isn’t going to blow your mind or change your life forever, but it’s an interesting behavior that I didn’t have noticed yet.
PS: If you have any idea about why this happens, just drop a comment ![]()
Git post-commit notification to Campfire
Recently we switched from Subversion to Git, and the thing I missed the most was the post-commit notifications we had popping up in Campfire. So, I added this to the .git/hooks/post-commit file in the project I’m working on:
#!/usr/local/bin/rubyrequire 'rubygems' require 'tinder' commit_author = `git show --pretty=format:"%an" HEAD | sed q`.chomp commit_log = `git show --pretty=format:"%s" HEAD | sed q`.chomp commit_date = `git show --pretty=format:"%aD" HEAD | sed q`.chomp commit_changed = `git-diff-tree -r --name-status HEAD` campfire = Tinder::Campfire.new 'your_account' campfire.login 'your_username', 'your_password' room = campfire.find_room_by_name('your_room_name') room.paste %(Commit by #{commit_author}nDescription: #{commit_log}nnChanged Files:n#{commit_changed}) room.leave
You’ll need to install the tinder and hpricot gems too. And post-commit must be executable
chmod 744 .git/hooks/post-commit
Now this will send a notification to Campfire each time you complete a local commit.
[Thanks to this pastie for the right git commands]Â
Rails deployment with Apache and mod_rails on Ubuntu Gutsy (7.10)
If you have ever deployed a Rails app, you have probably used a mongrel cluster running behind a proxy server (usually Apache, Lighttpd or, not so probably, Nginx) while this isn’t something painfully difficult, it makes Rails applications harder to deploy when compared to PHP, where you just send your file to the server, or Java, where you bundle your app in a .war file and place it on the server’s deployment folder.
The biggest problem with the mongrel cluster approach is that you have to take care of at least two processes. Although it was possible to have only an Apache server to deploy your Rails applications (using FCGI) this wasn’t a good approach as it will hurt your application performance.
But now, we have a true option to run our Rails applications using just an Apache server, and the option is called (tadá!) mod_rails!
mod_rails (or Passenger) is an Apache module that aims to enable seamless deployment of your Rails applications using only an Apache server. No proxies, no (visible) clusters, no other processes to handle, just copy your rails application to folder defined at the Apache’s virtual host configuration and be done with it.
But how does it works?
What mod_rails does is automatically manage a cluster of rails applications inside your Apache server, so you will have all functionalities and performance advantages of running a mongrel cluster without having to manage one. And something that really makes mod_rails special is that your rails applications are independent from the Apache server, if your application blows, the main server won’t go down the tubes. If you want to have a full architectural overview of how it works, take a look at their “Passenger architecture document”.
And now, enough of talking, let’s get our hands dirty and prepare the environment to deploy a rails application to mod_rails. First, this tutorial is aimed at preparing an Ubuntu 7.10, but it should probably work if you’re running 7.04 or maybe even a 6.x, but I can’t guarantee that.
If you don’t have Ruby yet…
If you’re going to do this in a brand new (aka. virgin) server, you will have to install some things (like Ruby
) first, login to the machine and start typing:
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get autoremove
This will update your system and take the garbage away. Then you go:
sudo apt-get install build-essential –y
This will install the software needed to build other things (like your native gems). After installing this, it’s time to install your database (in our case, it’s MySQL, but you can take another one, I promise I won’t feel bad about it) and Ruby, you can do this typing:
sudo apt-get install mysql-server mysql-client libmysqlclient15-dev libmysql-ruby1.8 ruby1.8-dev ruby1.8 ri1.8 rdoc1.8 irb1.8 libreadline-ruby1.8 libruby1.8 libopenssl-ruby irb1.8 libdbd-mysql-perl libdbi-perl libmysql-ruby1.8 libmysqlclient15-dev libmysqlclient15off libnet-daemon-perl libopenssl-ruby libopenssl-ruby1.8 libplrpc-perl libreadline-ruby1.8 libruby1.8 mysql-client mysql-client-5.0 mysql-common mysql-server mysql-server-5.0 rdoc1.8 ri1.8 ruby1.8 ruby1.8-dev zlib1g-dev
It’s possible that the ruby installer hasn’t added the symlinks, so, if typing “ruby”doesn’t work, try this:
sudo ln -s /usr/bin/ruby1.8 /usr/local/bin/ruby
sudo ln -s /usr/bin/rdoc1.8 /usr/local/bin/rdoc
sudo ln -s /usr/bin/ri1.8 /usr/local/bin/ri
sudo ln -s /usr/bin/irb1.8 /usr/local/bin/irb
With ruby installed, it’s time to install RubyGems. You can install RubyGems from apt, but it’s better to download it and perform a manual installation. There you go:
wget http://rubyforge.org/frs/download.php/35283/rubygems-1.1.1.tgz
tar xvzf rubygems-1.1.1.tgz
cd rubygems-1.1.1
sudo ruby setup.rb
After installing it, you also have to add a symlink:
sudo ln -s /usr/bin/gem1.8 /usr/bin/gem
Alfter all this typing, you must be really tired, so now comes the easy part..
As you’re planning to perform a Rails deployment, you are probably using Capistrano (why wouldn’t you use it?), so I have some recipes to make you type less, a LOT less. First, install the Apache 2 server, Apache’s development headers, the Apache Common Runtime, and finally the Rails and Passenger gems:
desc 'Installs apache 2 and development headers to compile passenger'
task :install, :roles => :web do
puts 'Preparing the environment'
puts 'Installing apache 2'
sudo 'apt-get install apache2 apache2.2-common apache2-mpm-prefork apache2-utils libexpat1 ssl-cert libapr1 libapr1-dev libaprutil1 libmagic1 libpcre3 libpq5 openssl apache2-prefork-dev -y'
puts 'Installing needed gems'
sudo 'gem install fastthread rake rails passenger'
end
This task will install the Apache 2 server (even if you already have Apache 2 installed, you should run this task to be sure that you also have the development libraries installed) and the required gems. After this, you’re almost there, login again to your server and type:
passenger-install-apache2-module
You will answer some questions (and probably you won’t need to install anything, as we have already installed all software needed) and when the script is done take note of what he’s saying, which means copy the values to your /etc/apache2/httpd.conf file, it should look like this:
LoadModule passenger_module /usr/lib/ruby/gems/1.8/gems/passenger-1.0.1/ext/apache2/mod_passenger.so
RailsSpawnServer /usr/lib/ruby/gems/1.8/gems/passenger-1.0.1/bin/passenger-spawn-server
RailsRuby /usr/bin/ruby1.8
RailsMaxPoolSize 2
On the first line, we are telling Apache to load the passenger_module (this is mod_rails), the next ones are mod_rails configurations. RailsSpawnServer is the path to the executable that starts the Rails servers, RailsRuby is where your ruby executable is and RailsMaxPoolSize is how many application instances (just like the mongrel instances) you want mod_rails to start. Don’t leave the RailsMaxPoolSize blank, as it’s default value is 20 (yeah, TWENTY) and you probably don’t have enough memory for all 20 rails applications.
And we’re done!
Well, almost
Now that you have apache configured and mod_rails (Passenger) being loaded, we have to tell Apache about our application, we do this using a virtual host configuration, but we are not going to write it with our own hands, oh no, so there is another task to do this for us:
desc 'Creates a virtual server configuration on apache to your application'
task :create_server_config, :roles => :web do template = File.read( File.dirname(__FILE__) + '/vhost_config.erb' )
buffer = ERB.new(template).result(binding)
puts 'Rendering template file'
put buffer, "#{shared_path}/#{application}-vhost"
puts 'Copying virtual server config to apache folder'
sudo "cp #{shared_path}/#{application}-vhost /etc/apache2/sites-available/#{application}-vhost"
puts 'Enabling the site on apache'
sudo "a2ensite #{application}-vhost"
end
This task uses an .erb file called vhost_config.erb that should be on the same directory of the file where this task is defined, here’s the template:
<VirtualHost <%= domain %>:80>
ServerName <%= server_name %>
DocumentRoot <%= deploy_to + '/current/public' %>
<Directory "<%= deploy_to + '/current/public' %>">
Options FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all
</Directory>
</VirtualHost>
It uses our own configuration (from deploy.rb) to define the virtual server. When we call this task, it will not only generate this virtual host config, but also copy it to the sites-available and then call “a2ensite application-vhost” installing the application on apache.
Then, you can just restart apache with the following task:
task :restart_apache do
puts 'Restarting the apache server'
sudo 'apache2ctl restart'
end
And you’re done, your Ubuntu server is running your rails application without any mongrels or anything else to manage besides Apache. Once your application is running, you can restart it with the following task:
desc 'Restarting the application'
task :restart_app do
puts 'Restarting the application'
run "touch #{deploy_to}/current/tmp/restart.txt"
end
Whenever you want to restart you app without restarting apache, it’s just a matter of touching the “/tmp/restart.txt” file (you have to create this file manually under your “/tmp” folder, it’s just a blank text file).
After all this, when once you have another application to deploy to the same Apache server, you will just generate a new virtual host file for it and it will be running without any other configuration or anything to manage. Could this be any better?
So, what about the good old mongrels?
mod_rails isn’t going to replace Mongel all over the world, because it’s a Rails only solution (although it probably can be tweaked to run other frameworks). What the guys at mod_rails are doing is integrating the a rails cluster inside Apache itself, without the need to run a separate server cluster, but as this is a necessity generated by Rails’ mono-threaded model, other frameworks, like Merb, will keep on using mongrel as their application server.
Another reason not to just look at mod_rails is when you already have a cluster of rails applications running on many computers and proxied by a common HTTP server, as you really need to distribute the load through many machines using one as a load balancer, it will not be so easy as I’m showing here, but even with a proxy, using apache at the application servers might help the performance of static content delivery.
Acknowledgements and references
Most of the installation instructions that you found here where taken from the following post by Vince Wadhwani (Thanks Vince!).
If you are not going to install mod_rails in an Ubuntu machine, checkout the mod_rails documentation.
Unit tests don’t guarantee that your system works
Last week we had an interesting message at the RSpec users list, the most interesting part of it is the following:
“I also had to go into specs on a project I’m not working on, and found an unholy hive of database-accessing specs. It’s disheartening. Basically, it’s cargo cult development practices - using the “bestpractice” without actually understanding it.”
You might have read this before, “/specs|tests/ that access the database are evil”, but have you ever asked yourself why?
Behavior Driven Development is the next step after Test Driven Development and it borrows many best practices found in the later. The two principles that interest us most in this conversation is test-first development and unit testing.
The idea behind test-first development is that before writing your code, you should write a test stating what you want you “future” code to do. By writing the test before the code you get to work on the public interface provided by your object, the test is the first client of your code, so, if your public interface is cumbersome or difficult to use, this test will be able to catch a bad idea before it’s materialized in your code.
And where is unit testing in all this? You should be doing test-first using unit tests, as unit tests will guarantee that the code you wrote for that single unit (a method, probably) works alone. If you have more objects that need to be used to test this specific behavior, you should use mock objects (fake objects) in their places, so you won’t be testing them in your unit test. Remember, unit tests should only test a unit of code, no more than that. We should do it this way so we don’t get distracted with the other objects implementation, we focus in testing our target, not it’s dependencies.
When we’re writing specs for our objects they should usually work as unit tests, they should only assert the behaviors of a single unit of code, everything else should be done using mocks and stubs. But I said usually.
As I said before, unit tests and your common specs, should only assert the behaviors of a unit of code without considering their relationships with the other objects on the system, but this only guarantees that they work as units. This will never guarantee that they will really work when in real contact with the other objects in the system, unit testing don’t guarantee that your system works, they surely help you to reach this goal, but they aren’t enough.
And what it has to do with that message, anyway?
That spec that access the database is just like an integration test, it asserts that the code being tested works fine when integrated with the database. So, the integration tests are the ones that really show you that your code works as a system, not only as a group of lonely objects.
I’m not saying that you should leave the unit tests behind, because they have a big importance to help you design your code and be sure that it works as a unit, but you shouldn’t rely only in them to test your system, a good suite of integration tests will give you the trust that everything works fine in conjunction.
And sometimes you can’t unit test a functionality, it’s all about integration. Let’s take the “validates_uniqueness_of” validation in ActiveRecord as an example, if you’re writing a spec for your ActiveRecord model, you should add one ‘it’ statement showing that this is needed (you’re specifying how your model behaves, remember?), so here’s how it could look:
it 'Should not be valid if there is another one with the same name' do
@common_name = 'testuser'
@user = User.create( :name => @common_name )
@another_user = User.new( :name => @common_name )
@another_user.should have(1).error_on(:name)
end
How could you perform this spec without touching the database?
First, you could look ad the “validates_uniqueness_of” source code, figure out how it works and stub it to return what you want, but this is bad because if the framework code changes your specs would break. The other way would be changing the database adapter to a mocked one and send exactly the result you wanted, but this is basically overkill. So why don’t you just leave the “purism” behind, test it in your database and be happy that your code works fine?
One important thing to notice is that integration tests are also slower to run, so you wouldn’t like to wait for the full suit run before performing a commit, usually you would run the unit and integration tests that are most likely to break if you did something wrong, the ones related to what you’re doing now and just be done with it.
So, if you’re in a project that has database accessing specs or specs that are using many real objects (and not mocks), don’t feel bad, but be sure that who wrote it knows that he is doing and that everything that can be unit tested is being unit tested. Integration tests should be written after your functionality is implemented and tested with unit tests, they are not interchangeable, nor you will replace one with the other.
And be sure to never commit your code before running your tests ![]()