Saturday, May 03, 2008

Errno::EPIPE (Broken pipe) MySQL Error in Rails

I've been working professionally with Ruby on Rails for a few months. To be exact that's mostly in SysAdmin capacity. During the time, I've seen some weired errors which I had not seen anywhere else. Time rolled on and now those things don't look weired at all. Actually I should have looked more carefully. Later, I did and found my way through. So here's some stuff I found. Hope this will save someones time.

(My Ruby servers post is coming shortly,.. really, and will include details about Thin and Passenger too. Actually I was waiting Phusion Passenger AKA mod_rails to be released. For a quick peak of the post, I'm currently running Thin in production and also evaluating Passenger.

Update: I've moved about 6 apps to Passenger. So far so good. Thin is still my first choice though. You can look forward to the post along with some Capistrano scripts too, ...soon. :)

1. Errno::EPIPE (Broken pipe - The Major Pain in the Neck)

The team I'm working with are using MySQL extensively. I'm glad they opted for an Open Source DBMS rather than doing what a lot of Sri Lankan IT firms do (i.e. Running the Unity Plaza edition of MS SQL Server). Since Sun's considering close sourcing parts of MySQL, I'll be on their nerve pushing toward PostgreSQL. I've always preferred Postgres (which turns 20 years old from the roots by next year) and even Sun is saying that PostgreSQL is the most advanced Open Source DBMS. Now that I mentioned it, expect the removal of that web page soon. :) Back to the topic.

We had a testing server which crashed overnight. It was a Linux (CentOS 5) installation. So I was pretty much sure that it wasn't about the OS. It worked as usual when we start the web app. Kept working fine. But when we return the next day morning,....... web app is not working, displaying an "Application Error" page.

Day one : I ignored it. Don't blame me. I had other servers to manage, being the only Linux admin might be a privilege, but not always. After all, the server was running an application going through heavy development. In fact, it had several known application errors. I might have not even read the logs.

Day two, day three, day four,....... Ok, there's something wrong. So I started digging through app logs. There it was, a broken pipe (literally).

The error read as Errno::EPIPE (Broken pipe). Quick Googling showed me that it was something reported before, even in pre-Mongrel era of Rails. At this point I was moving to Thin as my preferred Rails backend. So I mailed the friendly Thin Google Group. I will not go into the detailed discussion here. Anyone interested can see it in the above link.

So this is the problem. I had help figuring out what it actually was.
  • The error was occurring due to 'something' in MySQL driver
  • The actual error was the termination of DB connection of the app, due to inactivity

After discussing with Thin group and checking a lot of web pages, these were the only sulutions which seemed solutions. Wich means, I'm going to omit the parts where it was adviced to paint your face with salamander blood in a full-moon night and dance around a parking lot.

  • Set ActiveRecord::Base.verification_timeout = 14400 in config/environment.rb or to any value that is lower than the MySQL server's interactive_timeout setting. Or,
  • Create a sleeper thread which would use the DB connection periodically
Eg: do
loop do
ActiveRecord::Base.connection.select_value('select 1')

I tried both. Sometimes they seemed to succeed, but the crashing was not completely eradicated. I was getting really frustrated. There seems to be no other pragmatic solution, and people were starting to doubt whether Rails was enterprise ready. I desperately had to do something. So I went back to the basics and started working up. This is when I remembered that there are two MySQL drivers for Ruby.
This is the point I recalled installing the Ruby/MySQL since a part of the application requited to access a native MySQL driver. And certainly the error generated from mysql.rb. There, I had a break. So as the next natural step I removed the Ruby/MySQL and installed MySQL/Ruby. Although both the drivers were maintained by the same person, I had hope for a fix in the C driver.

Removing Ruby/MySQL proved to be as simple as deleting the mysql.rb from the installation location. Installing MySQL/Ruby was a little tricky. The site listed a two step build process. But the first step had 3 different alternate versions.
  • % ruby extconf.rb
  • % ruby extconf.rb --with-mysql-dir=/usr/local/mysql
  • % ruby extconf.rb --with-mysql-config

The step worked for me was
  • % ruby extconf.rb --with-mysql-config
  • % make

At this point I could make sure it worked by running the compiled thing like:
  • % ruby ./test.rb -- [hostname [user [passwd [dbname [port [socket [flag]]]]]]]

Then finally,
  • % sudo make install

That's it. It solved my problem. It's been over several weeks now and the application is running fine with the new MySQL driver. I know this is not a proper solution for the problem. But so far it proved to be better than anything found on the Internet for me. I hope someone else will also find this useful.

2. Proxy Errors

This second error is not related to MySQL at all, but I'll just mention it. It's more of a blunder from my side rather than an actual error.

The same application started giving out proxy errors. It wasn't all of a sudden. I've seen that error when one or more Mongrel instances in a Mongrel Cluster died. So I just restarted the whole Mongrel Cluster and informed the developers. This was the peak of annoyance of that MySQL error. So we were more concerned about that.

But the issue turned out to be a severe pain than I hoped. When developers complained about constant proxy errors, I knew I had to go back to logs. However without much delay I figured out where I've done the mistake. Since MySQL issue was solved, my mind was more relaxed to notice the stupid mistake I've done.

Earlier my mongrel instances were running from port 8000 to 8003. So my apache proxy/proxy_balancer configuration looked more like this:

But at one point of the configuration and tuning, I thought it would make more sense to run mongrels from port 8001 to port 8004. I actually went ahead with that and reconfigured the mongrel cluster so that it was running on ports 8001-8004. During that time I had tested both Nginx and Apache back and forth on the same server. So the web server configurations were being changed all the time. Eventually this ended up in my above shown mod_proxy/mod_proxy_balancer configuration.

It was a funny situation. Apache was looking for ports 8000-8003 where Mongrels were serving ports 8001-8004. Which resulted in Mongrel instance on 8004 being unused and Apache forwarding requests to a port (8000) where nothing was running. That is why the proxy error was regular and consistant. :) Fortunately, I found this before someone else did and saved myself from the ridicule.

So the next time when you get a proxy error which seems rcurring and cinsistant don't forget to check your backend configs (Eg: Mongrel, Thin, Ebb, etc.) Vs the web server proxy configurations (Eg: Apache, Nginx).